THE ANNALS 
of 

MATHEMATICAL 

sjTATHTir^ 

O JL xx 1 iu 1 Iv^O 

(FtlttNOBH BY II. C. CAUVEh) 

The Official Journal of tiie Institute of 
M A TII EM A 'IT C 'A L STATISTICS 


VOLUME XX 


1049 




CONTENTS OF VOLUME XX 
Articles 

Andbhhon, T W. and Herman Rubin. Intimation of the Parameters of a 
Single, Equation in a Complete, System of Stochastic Equations . 40 

Andrews, F. C h and Z. W. Biknuaum. On Hums of Symmetrically Truncated 

Normal Random Variables. 458 

Baker, (i. A. The Variance of the Proportions of Samples Fulling Within 

a Fixed Interval for a Normal Population. .123 

Banerjbe, K. H. A Note on Weighing Design. . 300 

Bancroft, 1’. A. Some Recurrence Formulae in the Incomplete Beta Func¬ 
tion Ratio. 451 

Barankin, 14. W. Locally Best Unbiased Estimates.477 

Berger, Agnkh and Ahiuihm Ward. On Distinct Hypotheses. 104 

Birndaum, Z. W. and F. ('. Andrews. On Sums of Symmetrically Truncated 

Normal Random Variables. 458 

Birnbax'M, Z. W. and II. S. ZnoKKRMAN. A Ciraphical Determination of 

Samph> Size for Wilks’ Tolerance Limits . 313 

Blom, Gvnnak, A Generalization of Wald’s Fundamental Identity.439 

Boas, R. P., Jr. Representation of Probability Distributions by Charlier 

Series. . . . 370 

Bosk, R. (J. A Note on Fisher's Inequality for Balanced Incomplete Block 

Designs .(11!) 

Cueunoff, Herman. Asymptotic. Studentiza(ion in Testing of Hypotheses. 208 

David, Herbert T, A Note on Random Walk... 003 

Door, J. L. Heuristic Approach to the Kolmogorov-Smirnov Theorems, . 393 
Dvciretzky, Aryeu, On the Strong Stability of a Sequence of Events.. ,, 290 
Dwyer, Pave S. Pearsonian Correlation Coefficients Associated with Least 

Squares Theory.404 

Epstein, Benjamin. A Modified Extreme Value Problem .. 99 

Epstein, Benjamin, The Distribution of Extreme Values in Samples Whose 

Members are Subject to a Markoff Chain Condition .590 

Erdos, P, On a Theorem of IIsu and Robbins. 280 

Godwin, II. J. A Note on Kac'e Derivation of the Distribution of the Mean 

Deviation. 127 

Godwin, H. J, Some Low Moments of Order Statistics... 279 

Goodman, Leo A. On the Estimation of the Number of Classes in a Popula¬ 
tion . 572 

Greenwood, Robert E. Numerical Integration for Linear Sums of Ex¬ 
ponential Functions... 008 

Grubbs, Frank E. On Designing Single Sampling Inspection Plans .... 242 
Halmos, Paul R, and L. J. Savage. Application of the Radon-Nikodym 

Theorem to the Theory of Sufficient Statistics . 225 

iii 


















IV 


VOLUME INDEX 


Hansen, Morris H. and William N. Hurwitz. On the Determination of 

Optimum Probabilities in Sampling...- • < 

Hatke, Sister Mary Agnes. A Certain Cumulative Probability Fune- 


tion .... . 

Hoel, P. G. and R P, Peterson. A Solution to the Problem of Optimum 

Classification . . *^3 

Horton, H. Burke and R. Tynes Smith, III. A Direct Method for Pro- 

ducing Random Digits in Any Number System .82 

I-Iowell, John M. Control Chart for Largest and Smallest Values . . . 1105 

Hurwitz, William N. and Morris H. Hansen. On the Determination of 

Optimum Probabilities in Sampling . ■ • • - -120 

Kimball, Bradford F. An Approximation to the Sampling Variance of an 
Estimated Maximum Value of Given Frequency Based on Fit of 
Doubly Exponential Distribution of Maximum Values . . . 110 

Lehmann, E L and G. Stein. On the Theory of Some Noil-Parametric 

Hypotheses , ...* 28 

Lev, Joseph. The Point Biserial Coefficient of Correlation.125 

Levene, Howard. On a Matching Problem Arising in Genetics. 01 

Matern, Bertil. Independence of Non-Negative Quadratic Forms in 

Normally Correlated Variables. .. 119 

Madow, William G. On the Theory of Systematic Sampling, II . 333 

McMillan, Brocicway. Spread of Minima of Largo Samples.441 

Mood, A. M. Tests of Independence in Contingency Tallies as l ’uctmtli- 

tional Tests . . .114 

Noether, Gottfried E On a Theorem by Wald and Wolfowitz. . . . 455 

Olds, Edwin G The 5% Significance Levels for Sums of Squares of Rank 

Differences and a Correction . .. 117 

Otter, Richard. The Multiplicative Process.... ... 200 

Paulson, Edward. A Multiple Decision Procedure for Certain Problems 

in the Analysis of Variance. 05 

Peiser, A. M Correction to “Asymptotic Formulas for Significance Levels 
of Certain Distributions”...128 


Peterson, R. P. and P. G. Hoel. A Solution to the Problem of Optimum 

Classification .433 

Pitman, E. J. G and Herbert Robbins. Application of the Method of 

Mixtures to Quadratic Forms in Normal Variates.. 552 

Quenouille, M. H. Problems in Plane Sampling. 355 

Quenouille, M H. The Joint Distribution of Serial Correlation Coef- 

ficient s .561 

Reich, Edgar. On the Convergence of the Classical Iterative Method of 

Solving Linear Simultaneous Equations. 448 

Riordan, John Inversion Formulas in Normal Variable Mapping.417 

Robbins, Herbert and E. J G. Pitman. Application of the Method of 
Mixtures to Quadratic Forms in Normal Variates.552 




















VOLUME INDEX 


V 


Rubin, Heumvn and T. W. Anderson. Estimation, of the Parameters of a 

Single Equation in a Complete System of Stochastic Equations. 46 

Savage, L. J. and Paul R. IIalmos. Application of the Radon-Nikodym 

Theorem to the Theory of Sufficient Statistics .225 

Sard, Aimnm. Smoothest Approximation Formulas .612 

Sbtii, G. 11. On the Variance of Estimates. 1 

Sohel, Milton and Abraham Wald. A Sequential Decision Procedure for 
Choosing one of Three Hypotheses Concerning the Unknown Mean of 

a Normal Distribution. 502 

Smith, It. Tynks, III and II. Burke IIoiiton. A Direct Method for Pro¬ 
ducing Random Digits in Any Number System . 82 

Stein, C. and K. L. Lehmann, On the Theory of Some Non-Parametric 

Hypotheses . 28 

Tukey, John W. Sufficiency, Truncation and Selection . 309 

Tukky, John W. Moments of Random Group Size Distributions. 523 

von Schblling, Hermann. A Formula for the Partial Sums of Some Hyper- 

geometrie. Series. 120 

Wald, Abraham and Agnes Berger. On Distinct Hypotheses.104 

Wald, Abraham. Statistical Decision Functions . 165 

Wald, Ahraham, Note on the Consistency of the Maximum Likelihood 

Estimate .595 

Wai.d, Aiiiuham and Milton Sohel. A Sequential Decision Procedure for 
Choosing one of Three Hypotheses Concerning the Unknown Mean of 

a Normal Distribution.502 

Walsh, John K. Some Significant 'rests for the Median which are Valid 

Under Very General Conditions. 64 

Walsh, John E. On the Range-Midrange Test and Some Tests with 

Bounded Significance Levels.. 257 

Walsh, John E, On the Power Function of the "Best” t-test Solution of the 

Bchrens-Fisher Problem .616 

Walsh, John E. Concerning Compound Randomization in a Binary Sys¬ 
tem . 580 

Wolfowitz, J. On Wald's Proof of the Consistency of the Maximum Likeli¬ 
hood Estimate. 601 

Woleowitz, J. The Power of the Classical Testa Associated with the Normal 

Distribution... 540 

WoonmniY, Max A. On a Probability Distribution....311 

Yohida, KAhaku. Brownian Motion on the Surface of the 3-Sphere. 292 

Zuckbuman, II. S. and Z. W. Birnhaum. A Graphical Determination of 
Sample Size for Wilks’ Tolerance Limits.313 


Miscellaneous 

Abstracts of Papers...130, 317, 464, 620 

Constitution and By-Laws of the Institute.327 































VI 


VOLUME INDEX 


Election of Officers and Council and Revision of By-Laws, 

News and Notices. 141, 32! s 

Report on the Berkeley Meeting of the Institute. . . . , 

Report on the Boulder Meeting of the Institute. 

Report on the Cleveland Meeting of the Institute. 

Report on the New York Meeting of the Institute. 

Report on the Seattle Meeting of the Institute. 

Report of the President of the Institute. 

Report of the Secretary-Treasurer of the Institute. 

Report of the Editor. 


UO 
470, (*,24 
47.1 
f»2fi 
1.W 
1125 
151 
. 150 

M) 
103 








ON THE VARIANCE OF ESTIMATES 

By G. R. Seth 
Columbia University 

Summary. In this paper recent, results on the lower bound to the variance 
of unbiased estimates have been brought together. Some of them have been 
extended to sequential estimates and the others have been improved to some 
extent. In the last section a general method for generating a system of orthog¬ 
onal polynomials with respect to a certain class of weight functions is obtained 
together with a lesult. on the. conditions under which the class of unbiased esti¬ 
mates formed by all functions of an unbiased estimate consists of just one element. 


1. Introduction. 

§1.1. Let Xi, A'j • ■ • be a sequence of chance variables whose distribution 
depends upon an unknown parameter 8 and possibly also a finite number of other 
parameters. It is assumed that either all the X's are absolutely continuous or 
that they are all discrete. Let p*(xi, x 9 , • • • , x * ; 0) denote the joint probabil¬ 
ity density function or the probability of (X t , • ■ ■ , X M ) according as the X’a 
are continuous or discrete. Let 0*(xi , ar 2 , • • • , x n ) be an unbiased estimate of 
6, where Xi, xa, • • • , x B is a sequence of observations on Xx, X t , ■ < < , X„. 

In this paper, we shall make use of the following short forms and abbrevia¬ 
tions: 

E(X) will represent the expectation of X. 

<r\X) will represent the variance of X . 

E(y | x ) will represent the conditional expectation of y, given x. 

8* will represent an abbreviation of 0*(xi, x%, • •• , x n ). 

f will represent an abbreviation of f(x) 0) or/(x; $x , 0 a , • • • , 0 T ). 

p„ will represent an abbreviation of p„(xi, x 3 , • • • , x„ ; 0 ) or p„(xi, x a , • • • , 

Xn ; 8i , 8} , • • • , 8f), 

p N will represent p„ for a fixed size sample, i.e., n •= N. 
g will represent on abbreviation of g(8*\ 8) or g(d\, 8 *, • • •, 8 *; 8 lt d%, • • - ,e T ). 
h will represent h(h , fc , • * • , fv-i j 8*; 8) or h($i ,{*,•’*, U-r I 0* , 0a, • • •, 

0?; 0i > 0a i * • • 0r). 


^<ufa.—n:r(tO will 


represent 


Pn 


q ' i +ij+”-Hr 

dffloF^Ti^ pn ' 




^ gi|-H»+*”+<r 

will represent - -j-— -■■ ■ 

8 SOl'd8' 3 ’■ • ■ 88 


< r (/• 


1 0 <t+f,+ 

h ix ,i ir ..,x T will represent r 


Hr 


A 00i l ddi^-dd'/ 


rh. 


In case differentiations with respect to one parameter are involved, the last 
three abbreviations will be shortened to <£,(n), g t and h, respectively. 

l 



2 


G. H. SETH 


In §1.1, n is assumed to be a constant equal to N, that is, the sequence of chance 
variables is finite and fixed, consisting of Ai , X 3 , X 3 , • • • > A’.-, , 

Cramer [1] and Rao [2] have shown that under certain conditions of regularity, 
the variance of 8* fa , za , • • , % n ) satisfies the inequality: 


( 1 . 11 ) 


a 2 e*(x 1 , tr 2 , 


) 



Cranffir [1] has shown that the lower bound for the variance of 
0*(xi ,Xt, ••• ,Xir) given by (1.1.1) is achieved if and only if: 

(1.1.2) . There exists a sufficient statistic for estimating 8. 

(1.1.3) . The probability distribution 6) of the sufficient statistic 

d*fa , Xi , • • • , Xir) is of the form 

K d 

B*fa, Xi, - • • ,x„) - 0 - 30 y(°*‘> 3 )> whenever g(8*; 0) > 0, 


where K depends only upon IF and the parameters in the distribution. 

Cramer calls the statistic 8* fa , , • • • , x N ) satisfying (1.1.2) and (1.1.3) 
an “efficient” statistic estimating 8 and we will use the word “efficient" in this 
sense alone Bhattacharyya [3] has shown that there exists a lower bound to the 
variance of 8* fa ,Xi, ■ ,x N ) which is higher than or equal to the one given in 
(1.1.1). This lower bound is ( m jX u , that is, 

(1.1.4) <r\6*fa , Xi , • • • , x K )) > < m) X U 


where 

and 

(1.1.5) 


ii«v'ii-iu.,ir\ 

’J_ d'vA , . . „ 

88' dd’ ) ’ b j — 1| 2 » ' ‘ ■ > m > 


where m is any positive integer. 


Let 8 consist ofT components h , 8 3 , • • • , d T , and p„fa , xa, • • • , x* ; fl T ) 

be the same as fl ffa , 8 1 , h, • • , d T ). Further let 8* fa, , • • • , x«), 

h fa , x %, • • ■ , Xfi), ■ ■ , Qffa , is, ■ • • , sc*) be unbiased estimates of 8i, $ 3 , 
1 " , respectively, with the non-singular covariance matrix jj F;/1| 

(r, j = 1,2, • • , T). Cram6r [4] has proved that under certain regularity con.” 
ditions, the ellipsoid 

( 1L6 ) T, V' s Ui = T + 2 


contains within itself the ellipsoid 

< u 7 > .£ l„t,h - T + 2, 

1 



where 

( 1 . 1 . 8 ) 

and 


VARIANCE OS’ ESTIMATES 


3 


ii v ' 3 ir 1 = ii v u ii, 


(1.1.9) 



This result is also implicitly contained in Rao [2). 

§1.2. Let us now take n as a chance variable determined by a sequential pro¬ 
cedure. Xi , Xs, X 3 , • ■ is a sequence of chance variables having the same 
probability density or probability/(x; 0), according as X is absolutely continu¬ 
ous or discrete. The sequential process tells us, after each successive observa¬ 
tion has been drawn, whether the next observation is to be taken or not. Thus 
n will denote the total number of observations taken by the time the sequential 
process has been completed. Under certain regularity conditions, Wolfowitz 
[5] has shown that if 0*(xi, xi, ■ • , x„) is an unbiased estimate of 0, then 


( 1 . 2 . 1 ) 


V 0 + (xi , %% , * * ,X„) ^ / r, \ 1 ■ 

En-E\J- o log f{x\0)J 


Furthermore, if 0 consists of T components, 0i , 0 2 , • ■ • , 0 T , and 
6*(xi, xt , • • • , x„), 0z(xi, Xi , • • , x„), • • • , Q*(xx , x t , ••• , x„) are unbiased 
estimates of 0i, 0j, • • • , 8 T respectively, Wolfowitz [5] has proved that 

(1.2.2) t J„W/-r + 2 

i.j-i 


is contained within the ellipsoid 

(1.2.3) £ V h U, = T + 2, 


where 


= En-E 


log / 5 log A 
v 00, 50, )’ 


,3 = 1, 


Blackwell and Girshick [6] have shown that the lower bound given by (1.2.1) 
for the variance of an unbiased estimate of 0 is attained only for the sequential 
process for which Pr(n = N) = 1, if the probability density function f(x; 9) of 
X is such that E(X) = 0 and xi + xs + x 5 , • • ■ + xm is a sufficient statistic for 
all integral values of M, for estimating 9;xi, x s, • • ■ , x M being M independent 
observations on the chance variable X. 

In this paper the following results have been obtained. The specific condi¬ 
tions under which the results hold are stated at their proper places along with 
the results: 

(1,3.1) The lower bound in (1.1.4) is valid when n is considered a 



4 


G. R. SETII 


chance variable determined by a sequential procedure instead of being a fixed 
number N. 

(1.3.2) The concentration ellipsoid defined in (1.2.3) contains within itself 
another ellipsoid 

2^ MuMj — T + 2 

t.J-l 


where n,j is given by (3.1.18), which in turn contains the ellipsoid given by 

(1.2.2). 

(1.3.3) . The Blackwell and Girshick result [(5) for the achievement of the lower 
bound for the variance of unbiased estimates given by (1.2.1) has been extended 

M 

to the case where the probability density (or probability) JI /(o',- ; 8), for all 

>-i 

fixed M> N, where N is the least value for which Pr(n - N) 0, has an 
unbiased “efficient" estimate for 8 in the sense defined by Cramer. This is 
illustrated by two examples of Wald sequential procedures. 

(1.3.4) . Let N be fixed and ^(.ti, x 3 , • ■ • , x,v ; 0) * | / j ** g(Q*-, g) 

h > > {am | 0*, 0), where J denotes the Jacobian of the transfor¬ 
mation from Xi, X!, • • • , x* to 0*, fe , . Hero g(d*\ 0), and 

Kh £*- 1 1 0*; 0) are respectively the probability density function (or 

probability) of 8* and the conditional probability density function (or prob¬ 
ability) of ti, b, • • ■ , U~i for a given value of 6*, 

The necessary and sufficient conditions under which the lower bound for the 
variance of unbiased estimates given by Bhattacharyyn [3] may be achieved are 
that there should exist a statistic 6* fa , x 2 , ■ ■ ■ , x„) such that: 

(a) hi , fa , ■ • • , h m are linearly dependent considered as functions of b , 
£»>’••> b-i for given values of 0 and 8*(x l , x s , • • • , x, v ) and 

(b) the probability density g(8*; 0) of 8* fa , x s , ■ • • , x*) satisfies the follow¬ 
ing equation: 


0 fa , Xi , • • • , Xtr) 


y K< £ 

(7(0*; 0) 00“ 


0 ( 0 *; 0 ), 


where K, are independent of the an , x 2 , x 3 , ■ • • , x N . 

Equivalent conditions for the multiparameter case have also been given. 
(1.3.5). The following properties of *(»), <fc(n), ... are derived: 

(a) nder certain conditions <h(N), • • • form a system of orthogonal 

polynomials m <Pi(N), the weight function being p K fa , x t , ■ ■ * , ; 8). 

(b) T.KMn) cannot be a function of *i , indopondent of 8 


except for the constant zero. 

(C) 12™’ 7 t'l'K 13 liuea I 1 7 depeadeat u P° n *»<»). then no other 
eltw T , f t ° rm 00 (3:1 > ** + b where a and h are 

fl 3 Tf i “ dependent of e ’ can be linearly related with &(n). 

. . ). a; fa , x 2 , • • ■ , *») is an unbiased estimate of 8 and b) if among 



• VARIANCE OF ESTIM ITER 


all functions of 6*{xx , x 2 , ■ • , x. v ) which are unbiased estimates of 8 with finite 
variance, Q* is the one with the least variance and such that the set of poly¬ 
nomials with respect to the distribution function of 8* is complete, then there is 
no function of 8* having a finite variance which is an unbiased estimate of 0. 

2. Estimation of a single parameter. 

§ 2 . 1 . Let Xi ,X 2 , • ■ and p M (xi , x 2 , ■ ■ • x. v ; 0) be as given in the first para¬ 
graph of (1-1). Let fl be the space of all possible infinite sequences (to) of obser¬ 
vations Xi , x 2 , ■ ■ ■ . Let there be given an infinite sequence of Borel measur¬ 
able functions $i(xi), <f> 2 (xi , x 2 ), ■ • • , <J>,(aT , x 2 . x 2 , ■ , x,) • • ■ , defined for 

all observable sequences in Q such that each takes only the values zero and one. 
We further assume that everywhere in Q, except possibly on a set whose proba¬ 
bility is zero for all 8 under consideration at least one of the functions ^(xi), 
$ 2 (xi, x 2 ), • • • takes the value of one. Let n be the smallest integer for which 
this occurs. Thus n(oi) is a chance variable. The sequential process is then 
defined as follows: 

Take an observation and find $i(x L ). If it is unity, the sampling process stops; 
otherwise continue sampling If a second observation is taken and the value of 
f h(x l , xi) is unity, the process stops; otherwise continue sampling, and so on. 
In general, if after taking j observations 

3>i(xi , a* ,•■•,*,) = 0 for < = 1, 2 , • • j - 1, 

and , x 2 , • • • , x,) = 1 , sampling stops; otherwise it is continued. Wc 
will denote by R, , the set of all points (xi , x 2 , • • • , x,) for which the proe iss 
stops with the jth observation 

Let 8*(x i, x 2 , • ■ • , x„) be a statistic whose expectation is a real valued func¬ 
tion 7 (8) of 8. The development proceeds on the assumption that 
Vm(x\ , x 2 , ■ • , x M ; 8) is a probability density function. The result is equally 
valid if t>m(xi , x 2 , ■ • ■ , x M ; 8) is the probability of discrete variables Xi , 
X 2 , ■ ■ , X M provided that integration is replaced by summation whenever 

this is required. Further the phrase “almost all points” in a Euclidean space of 
any finite dimensionality is understood to mean all points in the space with the 
following possible exceptions: 

(a) . A set of Lebesgue measure zero wheie pu(x i, x 2 , • ■ ■ , x M ; 6) is the prob¬ 
ability density function; 

(b) . The points which belong to the set Z, where Pm(xi , x 2 , • • • , x u ; 0 ) is the 

probability function of the discrete chance variables A'i, Xt , • • ■ , X K . The 
set Z consists of all points (xi , x 2 , • • ■ , x«) such that p«(xi ■ , x*;fl) » 

0 identically for all 8 under consideration. 

§ 2 , 2 . Conditions of regularity. We will postulate the following conditions to 
be satisfied by p M (xi, xz , ■ ■ * , x M ; 8) and 0*(x i, x 2 , • ■ • , x„). 

(2.2.1) 0 *(xi, x 2 , • ■ • x n ) has an expectation y(8) and a finite variance. All 
the derivations of y (8) are assumed to be finite. The parameter 9 lies in an open 
interval D of the real line D may consist of the entire line or an entire half line. 



6 


G. R. SISTII 


(2.2.2). The derivatives 

d'Vn 

96' 


(i = 1, 2, • • •, m), 


exist for all 8 in D and almost all mi, m 2 , ■ • • , Xm in Rm and for all M■ W e define 

J_ O'Pm _ q 
Pm 96' ’ 

whenever p M (x 1 , ®i, • • * , Xu ; 6) — 0; thus, 

- ^ = 0,01/) 

Piu 06' 

is defined for all 8 in D and almost all (x t , xt , • ■ • , x u ) in Rm . 

(2.2,3). For any integral j there exists non-negative L-measurable functions 
,T,(x 1 , Xi, • • ■ , x ,), (t = 1, 2, • • • , m ), such that 


•*i)i 


9* 

(a) | 0*(zi , x 2 , ■ ■ , xJjgiPtixi, xt, ■■■ , x,; 6) \ < T.fo , Xi, ■ 
for all 6 in D and almost all , x 2 , • * • , xj) in R ,, 

(b) f T t ($i t Xi , * * • , Xj) dx u , (t = 1 » 2, • • • , th), 

vR } ti*»l 

are finite. 

(2.2.4) . Let i,(d) = f 6*(x 1 , x ,)p,(xi, Xi, - ,Xj;6) U dx u . 

u -i 

We postulate the uniform convergence of 

S ^ tj(8), (t = 1, 2, • • •, m) 
d, 1 

(the existence of — (t,(6 )) is assured by the assumption (2,2.3).) 

(2.2.5) . There exist functions Sifa , x 2 , , x,) for every j, (i => 1,2, * *• ,m), 

such that when **(*, ,**,••■,*,) and r.fo, Xl , ■ •. , *,) are replaced by 

unity and , »», • • • , x,) respectively, conditions (2.2,3) and (2.2,4) still 
hold good. 

. (2 2 ' 6) ' The cov ariance matrix of $,(») (* = 1, ■ ■. , m) exists and is non¬ 
singular for almost all 6 m D and almost all fa , , ■ • * , *„). 

. “ 2,3 fl ;: Let us consider the sequential process mentioned in §2.1 and the func- 
i°ns (xi, x 2 , • ■ ■ ,Xn) and p M {xi , Xl , ■ • ■ , x M ; 6) which satisfy the regularity 

conditions m §2.2. We will now find a lower bound for the variance of such os* 
timates. 

Let us examine 

(2.3.1) F = E (e*( Xl ,%,••• *0 - y(6) - g K^,(n)J, 



VARIANCE OF ESTIMATES 


7 


where IC, (i = 1, 2, • • ■ , m) are independent of (x t , Xt, ■ ■ x„). Now (2.3.1) 
can be written as 


(2 3.2) 


F = ff 2 (e*(x l , Xi , ■■■ ,Xn)) - 2 £ K l Ed*(xi , Xu , X n)<P,(n) 


+ 2 y(6) X K,E4>i(n) + X) KtKfXij, 

i-i i.j—i 


where 

Now 


Xu - ■— (i, j 1.2, ■■■ > w). 

« r d i/ D' -rr 

(2 3.4) E(6*(x i, xt, , x„)4>,(n)) = X / d*(xi, x 2 , ■ ■ ■, x,) II dx u . 

7—1 JRf OU u—l 

We also know that 

(2 3.5) X f B*(x i, xt, , x,)p, II dx u = y(d). 

7-1 JRj U-l 

Differentiating both sides of (2 3 5) i times (i = 1, 2, • • * , m) we have, be¬ 
cause of conditions (2.2 3) and (2.2.4): 

(2.3.6) X £ 0*(xi, xt ,•••,*,) n d,T u = , (i = 1, 2, • • *, m). 


From (2.3.4) and (2.3.6), we obtain 

(2.3.7) E(d*(x i, z 2 , • • •, a ; n )<t>i(n)) = y(0). 

Differentiating 

(2.3.8) 1 = X [ Vi II 

;-i Jr, u-i 

i times (i = 1, 2, • • • , m) with respect to 9, we obtain because of conditions 
(2.2.5) 

(2.3.9) 0 = X ( (*- 1, •••,«). 

j-1 JRj OO u-l 

(2.3 8) is valid on account of the type of sequential process (2.1), Now 

(2.3.10) E(4>i(n)) = X f ~r II dx u , (i =■ 1, • • •, tn). 

7-1 JRj 0d l u-l 

By (2.3.7) and (2.3.10), (2.3.2) reduces to 

(2.3.11) F = Ae*(xt, Xt, ■ ■ ■, x n )) - 2 x Kt d -^p- + X KiKiK. 

i-l UV 1.7-1 

Now || || being non-singular on account of condition (2.2,6), we get just 

one set of values of K’s which minimize F. These values are given by 

tt _ V ^<jd'y(&) 

Kt ~k (m,x ~dT ’ 


(2.3.12) 



8 


G K. SETH 


where 

(2.3.13) ||«X-r = HX.,11, (i.i - 1,2, 

Putting the above values of Kj{j = 1, 2, ■ • • , m) in (2.3.11), we obtain 

(2.3.14) F = A8*bi, *.)) - £ . 

i.,-1 do' <w 

Hence, F being non-negative by (2.3.1), we have 

(2.3.15) ■ ***>. 

Thus R.H.S. of the above inequality gives the lower bound to the variance c, 
unbiased estimates of 7 ( 0) 1 When 7 ( 0 ) = 6, the above reduces to 

( 2 - 3 ' 16 ) i 1 (6*(xi *„)) > mX 11 . 


n 

When m = 1 and p n ( Xl , x ,, • • • , *, ; 6) = ]Jf(x, ; 0 ), (2.3.1(1) reduces to 


(2.3.17) A&*(xi, x u x„)) > 


1 

E " E ((-5j'°isn *;«)))' 


which is the result given by Wolfowitz [ 5 ]. 

When ft, the chance variable, is constant and equal to N, then (2.3.15) and 
(2 3.16) correspond to those given by Bhattacharyya ( 3 ). Although the con¬ 
ditions of regularity under which Bhattacharyya proves his results are not dear 
from his paper, they are likely to be slightly different from those in § 2 . 3 . as the 
results in [3] are obtained only for a fixed size sample. 

Wl11 ?°' ,V ’“ vesti e* t e the necessary and sufficient conditions under 
(23 H) g,Te “ iD <2 ■ 3 • l6, ” * 0taally ii8l, “ *•““ Biven in 

We can easily see that 


(2.4.1) 


\ u 

(m)X — 


Mi - Rln- m )' 


Sn)' ;jn) the mUl%Ie C0rrelati0D coefficient between *,(«) and *<*), 
is given by SS ^ ** W bo “ nd given by (2 ' 3 ' 16 > over when we use » - 1 


(2.4.2) 


_ 1 _ 1 

Xn(l — R\ S3... m ) X u 


estimates has been o^^dbrme^Mg^eTMs^r 1, b °-i Dd t0 variance of unl) i»flecl 
in an unpublished paper by A. Wald. Inden P nri P ti ®, ai i mlar result for fixed size samplea 
in a paper not yet published P ° y 0 Ste,ri has obtained the same result 



which is further equal to 
(2.4 3) 


VARIANCE OP ESTIMATES 


9 


■ffj 23- m 

^ll(l — Hi 23 .m) 

Thus the lower bound for the variance of unbiased estimates of 0 is obtained 
by using m > 1 is higher than that obtained by employing to = 1 if and only if 
Ri 23 m is not zero for some m > 2. This is equivalent to the condition that for 
at least one i > 2, Xi,, the correlation coefficient between 4>t(n) and <£.(n) (i > 1), 
is different from zero. Suppose further that we have used to = a and that we 
wish to find the increase in the lower bound if a were replaced by a + 1, The 
increase in this ease is given by 

2 

(2.4.4) Pl(‘+»-i3 « 

Xll(l — i?1.23 .(a+1)) 

where pna+n .23 a is the partial correlation coefficient between <t>i(ri) and <f> a+ i (n) 
keeping </> 2 (n), • • ■ , <p a (n ) fixed. It is greater than zero if and only if Pita^n 
is not equal to zero. 

§2 5. If p n (x i , Xi, • • , x v , 8i) also depends upon a finite number of other 
parameters 6i, da, • • • , 8 r , then a lower bound higher than or equal to that 
given in (2 3.16) can be obtained by using 

m 

• ,*7* $i\,i2 ,i T (n ) instead of E K, t ■ <#>,-,(«.) in(2-3-l). 

•1+'J+. ■ +'fSm ll-l 

The lower bound in this case is given by (3.1.14) (see section 3) by taking s =■ 1, 
that is, 

(2.5 1) ffV'fo.a*, ••• ,*„)) > C(l, 1) 

where C(l, 1 ) is the element in the first row and first column of the inverse of W 
defined in (3.1,9). 

The result for n = N, N fixed, is obtained by Bhattacharyya [3, 1947]. Let 
us illustrate it by an example Take samples of fixed size Ah Suppose we are 
required to find the lower bound to the variance of unbiased estimates of 0 i 
in the normal population 

(2 5.2) /(,,M 2 ) = 

on the basis of N independent observations -n , xi , ■ • • ,x N . The lower bound 
for the variance of the unbiased estimates of , when we use 

E /v.i • <f>. 1 (A r ) in (2 3.1) is given by ~. 

iV 

However, if E ■ <^>i 1 ,, a (A7) is used, the lower bound, by the help of 

U + >2 ^ 

(2 5.1), is found to be equal to 26\/(N - 1) In fact there exists the statistic 

E (*, - if 


N - 1 



10 


G. E. SETII 


whose variance is equal to 201/(iV — 1) where .S = JL ... Ihus the use of 

i~i jv 

£ K n , Xl • <j> ivl ,(N) brings into relief the unbiased estimate with the 

» l +»2 < 2 

least variance. 


3. Multi-parameter case. In this section we will prove the result mentioned 
in (1.3.2) of §1.3. 

§3.1. Let 8 consist of T components (di , 0 2 , • • • , 8 t ) and 0* , 0* , * - , 0* be 
unbiased estimates of 6i , , ■ ■ • , Bt respectively. Also, let a sequential process 
of the type described in §2.1 be given. We postulate the following regularity con¬ 
ditions: 

(3.11). The covariance matrix || 7„-1| of the estimates 8*(i = 1, 2, • • • , T) is 
non-singular in D, where D is an open interval of the T-dimmsional parameter 
space. 

(3.1.2) . The conditions of section (2.2) are satisfied for each one of 
8*(i = 1, 2, • ■ • , T) and ...i T (n), (■k -f ii + • • • + i T < m). 

(3.1.3) . The covariance matrix of <£,„,• < T (n), t’i + ii + • ■ • + i T < m exists 

and is non-singular. Under the assumptions (3.1.1)-(3.L3), we prove the result 
(1.3.2) m section 1.3. 

Peoof: Using the same arguments of §2.3, we obtain 


(3.1.4) E(d*(x l>X2 1 ' ' ' l .,t>(w)) 

(3.1.5) 


j fa - - i, 2 , 

’ U - 1. 2, * • •, T 
0 otherwise. 


Let the covariance matrix of B*{j = 1 , 2 , •••,«;« < T) and #, t , u . <r (n), 

(*i + ii + • • • + i T < m) be given by 


(3.1.6) 

where 


A 

B 

B' 

W 


(3 ' L7) A = ||F t| ||, i,j = 1,2, 

(3.1.8) B = || Z, 0 ||; 

(3.1 9) and W = covariance matrbe of the set 


. 8 < T\ 


-j -12 -f- • • > -j- i T < m], 

arranged such that the jth term in the leading diagonal is given by 

(3 1.10) £(<)><„„., r (n)), where i, = 1, i„ = 0, p =j= j, (j *= 1, 2, •«• , T), 

and B' is the transpose of B. 

As U is positive semi-definite, we have 
(3.1.11) | V J > 0. 



VARIANCE OF ESTIMATE:* 


11 


The above can further be reduced to 

(3.1.12) \W\- \ A - BU^'B' | > 0, 

which leads to 


(3.1.13) | A — B • W~ L • B‘ | > 0, as IF is positive definite. 


By the use of (3.1 8) we obtain from above 

(3.1.14) | X — <7 | > 0 

where C is the top left part of IF -1 , consisting of s rows and s columns. 
Let us now consider the matrix 


(3-1.15) || V,j - v u ||, (*, j = 1, 2, • • • , T), 

where |( || is the topleft part of W~ l consisting of T rows and T columns, and 

is equal to 

(3.116) II Wn - TFuTFTMi ir, 


when W is written as 
(3.1.17) 


Wn 

W n 

W n 

W M 


where TFn has T rows and T columns 

By the repeated application of (3.1.14), we are led to the conclusion that all 
the leading minors of the matrix in (3.1.15) are either positive or zero. Hence 
the matrix m (3.1.15) is semi-positive definite. 

If now we put 

(3.1.18) ii/*., ii « iKir 1 

we obtain 


(3.1.19) ll/w-r’ii 
to be semi-positive definite. Thus the ellipsoid 

(3.1.20) E = T + 2 

l.l-L 


contains within itself the ellipsoid 

(3.1.21) E Pu’lt't] = T - (-2. 

Cramer calls the ellipsoid in (3.1,20) a "concentration” ellipsoid. 

Wo will now show that the ellipsoid given by (3.1.21) contains within itself 
the ellipsoid 

(3.1.22) E L,>t r t 7 = T + 2 



12 


G. n. SETH 


wtee || I„ II is to Uta-ft* »to given by Wu in (3.1.17). W. will prove 
the above by showing 

(3.1.23) ||/„ - Mi Hi (’•>- 

to be semi-positive definite. 

We obtain, from (3.1.16) and (3.1.18), 

(3.1.24) II Vii II = TPu - WuWltWti, 

From the above it follows that 


(i,j m li 2, • 1 • , T). 


(3.1.25) 


\I>, - 


M.V i 


WuWTiWn . 


Thus the matrix on the right hand side is semi-positive definite since W» i» 
positive definite, we see that the ellipsoid ( 3 . 1 . 21 ) contains withm itself the el ip- 
soid given by (3.1.22). This proves the assertion made m (1.3.2) of U 

may be seen that ( 3 . 1 . 22 ) is strictly contained m (3.1 21) if and only if 1K» 4- 0. 
It may be mentioned that in this section as well as elsewhere, T + 2, appearing 
on the right hand side of the equation of an ellipsoid, can be replaced by any 
positive constant. Also the ellipsoid in (3.1 21) depends upon the choice of m 
and it can be shown that for any two positive integers nh , w* («h > i>h) the el¬ 
lipsoid for m - m l contains within itself the one for m = nh . 

§3.2. In general, let B*fa , an ,-••,*„) be statistics whose expectations are 
7 ,(fli, , ■ • , fl r ), (* “ 1,2, • • , T) t the latter being assumed to admit partial 

derivatives of all possible orders. Under the postulates enumerated in §3.1, 
we see that the ellipsoid in (3,1.20) contains within itself the ellipsoid 


(3.2.1) 


£ SvU-ti - T + 2 

t,;—J 


where 

( 3 . 2 . 2 ) 

and 

(3 2.3) 


S<A\ = ||fi!W- l 2?'H' 




i, j = 1 , 2 , 


T, 


R = 


-It I 

^WTW^7 7 '(0i,0J f "* > 8 r) j , 

0 = 1,2, • • • , T; ii + it + ■ • ■ + t*r w0, 


where j and ii + 12 + ■ ■ + ir indicate the number of the row and the column 
respectively and is arranged to correspond to the arrangement of IK, where IK 
is the same as given in (3,1,9). 


4. Achievement of the different lower bounds. In §4.1 we will demonstrate 
the desirability of finding a higher lower bound to the variance of sequential 
estimates than that given by Wolfowitz, by giving two examples in which the 
latter is not achieved From §2,4 it is clear that this will be so if I?(<£i(n) • <£,(«.)) 
is not zero for at least one value of i > 2. We will demonstrate that tin's is true 



VARIANCE OF ESTIMATES 


13 


for i = 2. In §1 2 we show that if “efficient” statistic exists for all M > N, 
the bound is achieved only in the case when the sample size is fixed. In §4.3 
we obtain necessary and sufficient conditions for the attainment of the bound 
given in (1.1.4). In §4 4 we discuss the conditions under which there exists a 
“concentration ellipsoid” which coincides with the ellipsoid given in (3.1.21) for 
samples of fixed size Ah 

§4.1. Ex. 1. The Wald sequential procedure for testing 8 = Oi , against 0 = 
in a normal population 

(4.1.D '(*’») ^^ 

is given as follows: A/ 

(4.1 2) B < £ (x x - < A, (s = 1, 2, • • ■ , j - 1), 

and 

(4.1 3) X) (%i — ts ci '(her > A or <B, 

we cease sampling and make a decision. Here /I and B are constants fixed by the 
probability levels of making a correct decision 

Let us denote the set of points satisfying (4.1 2) and (4.1.3) by Rj . In this 
case 

" n 

(4.1.4) <f>i {n) = (ai, — 6) = 7, n — nd, where Z n = ah- 

t-i i-i 

The above is differentiable with respect to 0. On differentiating we have 

(4.1.5) 0 2 (a) = (Z n - ndf - n. 

Now 

(4 1.6) Efoi(ft) • fc(n)) = E(Z„ ~ nd) 3 - E{n{Z„ - nd)). 

By theorem 7.3, Wolfowitz [5], 

(4.1.7) E(Z» - nd) 3 = En ■ E(X - d) 3 + 3 E(n(Z a - nd)), 

where X has the distribution given in (4.1.1), As E(X - 6) 3 is equal to zero, 

(4.1.6) reduces to 

(4 1.8) E(Un) • 0 2 (n)) - 2 E(n(Z„ - nfl)). 

We will now show that right hand side of (4.1,8) is not identically zero in 0. 
Let us consider 

(4.1,9) EM . ±1 [«p (-5 t (X, - «')] ft*.. 



14 


G. R. SETH 


Differentiating with respect to 8, we get 

O'l-io) * N - 11 {•* (-1 £ fe -»>’)] Ii ** • 

The righthand side of the above equation being equal to E(n(Z n -* n£)), the lat¬ 
ter does not vanish identically in 8, because the lefthand side is not identically 
zero. The step from, (4.1,9) to (4.1.10) can be easily seen to be valid. 

Ex. 2. The Wald sequential procedure for testing p = pi against p m fa in a 
binomial distribution, where p is the probability of the event occurring, in given 
as follows: If 

(4.1.11) B < £ (an - d) < A, s =• 1,2, ... ,j - 1, 

i“l 

and 

i 

(4-1-12) (xi — d) is either > A or <B, 

»"1 

where d is given by [log (1 - p,)/(l - p*)]/log [(p*(l - p,)/ Pl ( 1 - p,)], the 
-process stops with the jth observation and a decision is taken . Here, x { ia the 
characteristic function of the event at the ith trial, that is: 

x { = 1, when the event occurs at the ith trial; 

= 0, otherwise. 

Let us denote the set of points satisfying (4.1.11) and (4.1.12) by It { . In this 
case we find 


(4.1.13) E{Un )•*(»)] = • E(n(Z n - np)), 

n 

where Z n - E an • We have now to show that the righthand side is not iden¬ 
tically zero. Differentiating 

( 4 - 1,14 ) E(n) = - v )’- z > 

7—1 Sf 

with regard to p, we obtain 


(4.1.15) 


5 (EW) -§ 2 •»*'■(! 

The righthand side of the above is the same as 
(4 1.16) 1 


v) 


i '-*/ 


E(n{Z n - np)). 


p(l - p) 

Thus a* lefthand s,de of (4.1.15) being not identically sero, the same is true for 
(4 U6), and consent., the bonnd given by Wolfo ,( it , „■ J ™™tL°fs 



VARIANCE OP ESTIMATES 


15 


The step from (4.1.14) to (4.1,15) is valid as 

( 4 .U 7 ) 

is absolutely and uniformly convergent. 

§4,2. Let 0* be some unbiased estimate of 0, where .r.’s a'-e successive inde¬ 
pendent observations on the chance variable X having the probability density 
function or probability function/fa - ; 0). We adopt a sequential procedure men- 
tioned in §2.1 satisfying the regularity conditions in §2.2 and also postulate 
the following: 

(i) For all positive integral values of M > N 

M 

, *» 1 ■ • •, ; 0) = II f{x, ; 9 ) 

1-1 


possesses an ‘efficient’ estimate for 0, where N is the least value of n for which 
Pr(n = N) ={= 0. 

(ii) B(n) exists and admits derivatives up to the second order with respect 
d 

to 0. Furthermore, —(E(n)) is either zero for all 0 under consideration or is 
never zero. 

Under the above conditions the Wolfowitz lower bound for the variance of unbiased 
estimates is achieved only when Pr(n — N) = 1. 

Proof: This bound will be attained if and only if there exists an unbiased 
estimate 0* of 0 such that 


(4.2.1) 
that is, 

(4.2.2) 


E{8* - 0 - K0i(n)) z = 0, 


6 * - 0 = 


with probability one, where K is independent of all xfs and n. As there exists 
an ‘efficient’ estimate, say ^(M) for all M > N, we have 


(4.2.3) 


— 0 = 


M ^[(^log/OM))’ 
for all M > N. From (4.2.2) and (4.2.3), it follows that 


(4.2.4) 
Now as 


0* ~ 6 — K • n ’ (*(n) — 0) • E \ [~ log/(*; 


(~ l°g/(x;0))_ . 


K -- 


Fn.F^log/^;^]’ 


(4.2.5) 



16 


G. R. SETH 


we have 
(4.2.6) 


e* - d = 


ft • ('Pin) — 9) 
En 


If E(n) is independent of 6, then from (4.2.6), we obtain 
(4.2.7) n/E(n ) = 1, 

that is, ft is constant with probability one and the sequential procedure reduces 
to a fixed size sample case. If E(n) is not independent of 0, then diffr ren tinting 
(4.2.6) with regard to 8, we obtain 


(4.2.8) 


w 


1 


Wn) ~ 8) • jz (En) 

ad , n 

(En) 2 + En' 


As (En) is not equal to zero for any 6 under consideration, substituting the 
value of '/-(ft) from (4.2.8) in (4.2,6), the latter takes the form: 

En — n 


(4.2.9) 


0 * - 6 = 


35 « 


Differentiating the above with respect to 9, the result is: 

En — n 


(4.2.10) 


Now if 




s H' 


t (Bi) + 


jjp(En) - 0, then (4.2,10) is not valid, thereby contradicting (4.2.2), 

<£ 

If df 2 ^ 0) tben rearranging (4.2.10), we obtain 

.-- 2 (W 


(4.2.11) 




+ En, 


that is, ft is a constant with probability one. This proves that Wolfowitz bound 

lltZf it! CaS I 'A hen W ^ with probability one. This generalizes 
the result of Blackwell and Girshick [6] to the extent that in [6] the existence of 
an efficient estimate is assumed for all integral values of M instead of M > N 

also a TaUd wSn e 'th V6r pwof given here ’ with sIi 8 hfc modifications, is 

also valid when th e successive observations are not independent. 

really 11 amiunLTo that “*** + + .*+x* k sciatic for all M>' 

M," when we restrict ourselves tn f + , .f h 0 ,* 13 en \ c!cnt statistic for all 

given by Koopman m [7]. P 1 Y denslty functions satisfying the conditions 



VARIANCE OF ESTIMATES 


17 


§4.3. Let us consider a sample of fixed size N. Let ft* together with the 
probability density function p N satisfy the following regularity conditions: 

(i). There exists a transformation T from (xi, x } , • ■ • , x N ) to the variables 

£* “ £t(:ci, ■ j :zy), ft* 5=3 ft*(xi , 1 ■ * j Xn), 

(4.3.1) 

i = 1,2, •••-,1V- 1, 

such that 

(a). The functions £< are everywhere unique and continuous, and have con¬ 
tinuous partial derivatives 


a A d Jl a 

dx u ’ d-Tu 


1 , 2 , 


• ■ , N — 1, u 


1 , 2 , 


,iV) 


in all points (*i, *i, ■ • • , x») except possibly in certain points belonging 
to a finite number of hyper-surfaces. 

(b). The relation (4.3.1) define a one-to-one correspondence between the 
points x = (xi, Xi, • ■ • , x N ) and y = (£i, £ s , • • • , £jv_i , ft*) so that 
conversely x, = i?,(£i, £ 2 , • • - , £^- 1 , ft*) where 17 ; are unique. 

(ii) . There exists partial derivatives of g{6 *;0), 7i(£i, £ 2 , • • ■ , £jv-i | ft) with 
regard to ft of all orders up to and including m, where m is some finite integer. 
The variances of ft*, hi and gi ■ g, , i,j = 1,2, ■ ■ • , m, are finite, whore hi and gi 
are defined in section 1. 

(iii) . There exist functions 


such that 



1,2, ■■■ ,m\ 
1, 2, 3 ) 


ft'Vs 
dft' 


< T tl ( Xi ,Xi, • • • , x N ); 


A 

dft 1 


< Tn(ft *); 


d'h 
dft 1 


< T,3(£i , £2, 


£y-i ; ft*), 


for all 0 in D and for almost all (a* , * 2 , • • • , %n) where D is an open interval. 
Further 

I T>i(xi , zj , * f 1 a?y) II da?u , 

J U-l 

f T a (0*) dft* and [ T, 3 (£i , £2 , ■ • • , £*-i; ft*) II d£ u 

J J t-i 

are all finite, the range of integration, in each case, being the whole range for 
the arguments indicated. Then the necessary and sufficient conditions that the 
variance of ft* equals the lower bound given in (1.1.4) are 



18 


G. B. SETH 


(iv) . hi , hz, ■ • • , h m are linearly dependent considered as function* of £,, % 2 > 
■ , for any given 8* and 9, and 

(v) The probability density function g of d* is of the form 

0*-0~±K, g, 

»«1 


where IU may depend upon d and N only. 

The proof here is given when p.v is a probability density function. U is also 
valid with slight modification when p» is the probability of discrete variables. 

Proof: Let J be the Jacobian of the transformation T in (4.3.1). Then 
because of conditions (i) and (ii) above, we have, 

(4.3.2) , x 2 , 

Further 

(4.3.3) / k(& i 6* ; 9) fj* - 1, 

J U-I 

the range of integration being the space of , f y _,, Differentiating 

the above i times under the integral sign, it follows that 

(4-3.4) E(h { | &*, 6) - 0. 

Similarly we have 


(4-3.5) E(g t ■ hi) = 0 

as the expectation of the quantity on the L.H.S. is finite by virtue of (ii). More 
generally, we have 

(4.3.6) E(F(d*) ■ hi) - E[F{0*) - B{ht | 0*)] = 0 

if E(F(9*) • h,j is finite. Let us now examine 

(4 ' 3 ' 7) E (** - 9 - 2 J, 

where K^ t (N) can also be written as 


(4 - 3 ' 8) *•(«• + ())*■»»>+" + ».). 

Now (4.3.7) can be put in the form 

(4.3.9) 

where 

clearly depend on 5 and 0* only. 


*(«•-*-£ s* - in. »,y, 

\ <-i <-i / ’ 


(f « 1, 2, ■ ■ ■ , m ), 



VARIANCE OP ESTIMATES 


19 


By virtue of (4.3.4—1.3.G) and F(0*) involved in (4.3.9) being sucli that 
E[F(6*) • hi](i — 1, 2, • ■ ■ , m) is finite because of (li), we can further reduce 
(4.3.9) to 

(4.3.11) E (d* - e - £ Ki gij + E^E ((£ Uhj 0*) . 

The lower bound will be achieved if and only if the above expression is zero, 
the necessary and sufficient conditions for which are: 

(4.3.12) 0* - 6 = £ K, ■ g,, 

l«l 

and 

m 

(4.3.13) 23 E, h{ = 0 in £i , £ 2 » • • ■ , fa-i 

i-i 

for any given values of 0* and 0. 

(4.3.13) is equivalent to the condition that A,, (i = 1, 2, ■ • * , m) are linearly 
dependent considered as functions of &, & , • • • , £w_i for any given values of Q 
and 9*. 

When m takes the value one, the above reduces to the Cramer conditions for 
the existence of an “efficient” estimate. 

§4.4. Multipqrameler case. Let 6 *, 6$, • ■ ■ , 0* be the unbiased estimates 
of 0i, 0 2 , • • • , 0 r in the probability density function 

Pn{x\ j Xz j • * * , Xn di j 02 , * • • , 0t) 

and the regularity conditions of §4.3 are satisfied when 9* and (f = 1, 2, ■ • ■ , 
m) are replaced by 9* (j = 1,2, • • , T) and 

■ • ■ W h + U + • ■ - + i r < m) 

respectively. Further let 


Pn(xi , 2:31 ■ * * , xn ; e x , Qi , 1 * • , 87.) * ( j j 

(4.4.1) = Q(9i, 9%, • ■ ■ , ot', 9i, 82 , ■ * • , A?) 

• Aft 1 fv-i | fl?, 9*, ■■■ ,0r) 

where p and h are respectively the joint probability distribution functions of 
1 #2 > ■ • 1 Ot and the conditional probability distribution of £1 ,£*,*■» , 
for a given set of values of dt , 0*, ■ ■ ■ , 6*. In order that the ellipsoid (3,1,20) 
coincides with the one given by (3.1.21), it is necessary and sufficient that the 
following be satisfied for each l{t ~ 1, 2, ■ • • , T) 

(4.4.2) E (9, - 9 t - ii+ . j+ L + .^ ( .. * *u.n,.ir )* - 0. 


(4.4.1) 


- 9(*t, <>t, 



20 


q n asm 


Now reasoning smulai to that in §13, we conclude horn the abuse that the 
necessary and sufficient conditions cum 

There exi&t T independent linear combinations of 

(4 4 3) h,„ l2l llf , 'll + 1-(Hr + it ^ M 

winch vanish with probability one foi any given values of the sots 

(or, »?, , <>*r) and (Si, fl,, ■ • , fl T ), 


and 

(44,4) 9 ( — Ot =* X) Mil'll i>r p ^ 03 lp 1 i T> 

il + iaT * hi 

where the /f’s do not depend upon tit and fr J s For T ** 1, the above reduce 
to the conditions in §4,3 Wo will now give an example in which (1 13) ami 
(4 4 4) aie satisfied Let 

■ - * ■ * • *> = edb* [ exp - ai, ■ §<*• ~ *?] 


0? - £ (», -,8)V(JV ~ 1), 
1-1 


(4 4 5) , it, 

We hove 
(4 46) 

(W7i tf = £ i,/AT = a, 

i-l 

unbiased estimates of Si and S t in (U 6) The joint distribution of Sf nnd 
^ ia given by 

__ 0? Id’s 

1 S Wi jV(<V - 1) ' u SO] ’ 


(4.4 9) 


tf 


«, + _J?L i 

1 si „ — r ~ T ’ 


N 


(44,10) s? = & + !;’-• 

N a dtii 

5S lint? 6 '"r n : e,li r tl Ior ot ' * «*»*■ wi«t iiio ei- 

Ui ill , n*' 2 ', 0n tw Q,,lwr band it we use nr •= v the condition 

44 3 ]S satisfied hut not the one in (4.4 4), as can be seen A om (* “« “ £ 

he concentration ellipsoid strictly cento, ns nothin itself the one nlvon bv he 
information mate. It may be noted that for m = 1, th 0 d L 



VARIANCE OF ESTIMATES 


21 


meiely lequires that a system of sufficient statistics exists foi estimating <h t 
0 2 , | Ot The reason is that the condition (4 4 3) takes the equivalent form 

(44 U) 4r = o 

dOy 

for i — 1, 2, , T identically in fi, £ a , * * , that is, that /i is free of 

di, O2, , Ot . 


5 Miscellaneous In §5 1—§5 3 ivo discuss certain properties of 0»(n) Ill 
§5,4 we obtain conditions under winch thcie exists no unbiased estimate of 0 , 
having a finite variance, which is functionally dependent upon a given unbiased 
estimate 0* of 0 

§5 1 Assume that there exists an '‘efficient 11 statistic 0*(si, Xj, , s*) 

for estimating 0 P in probability density function (or probability) 

Pn(m , ^ 2 , ’ » zj/ j 0) 

That is, 

(5,11) 0*(xt , % 3 , ■ - , X#) - 0 = K i^jCA r ) 


where K as usual may only depend on 0 We postulato as usual the existence 
of all pnitial dciivatives of p* of all mders and also of Ii up to the third order 
with 


(5 1.2) 


d 3 K 
dO » 


= 0. 


Fuifcher we assume that 


^ <nx uX „. , t j 


where 


f N 

/ T,(x 1 , xt , > ,%) II dXu is finite for all x 

J u-l 


Under tho above assumptions we will show that 


urn = 1 , *i<jv), un), « ,*.(n < 

form a set of orthogonal polynomials in faCN) with respect to the weight function 

Msi, xi t > , x N ; 0), 

Pnoor We can easily see that 


(5 13) 


_ , , , 

- <P. 11 ~ <P< <PI 


where </> ( (A0 is slioi toned to </>, for convenience 
respect to 0, 


/r -f 


0^1 


Differoutinting (0 1 1) with 


1 dK 


1 



22 


G It 8ETir 


Let us designate 
(5 15) 


1 d'K 
Zl “ Ii dtr 


for all integral values of t. From (5 1 3) find (5 1 1), it follows that 


(5 1,0) 


fa — 3:1 ~ 


Differentiating (5,1,0) further with regard to 8 and using (5 1 3) and (5 l U) \vo 
obtain 


(5,17) 


fa — fafa ** — 2®t0a 




Differentiating (5 1,7) with regard to 8, and using (5 1 2) wc got 

(5.1,8) fa ~ fafa “ —■ 2si^ ^3za 4* j^J fa> 

We assume generally that 


(6 1 0) $n-i — fafa = -"i&yfa — 4* fa-i' 

Differentiating (5 1 9), and employing (5 1 3), (5 1 3) and (5,1,9) wo obtain 

(5.110) fa+i - fafa+i *■ -(« 4* ~ & -p fa t 

We know that (5,1,9) holds for i = 1, 2, 3, fa being taken equal to one, and 
we have proved that if (5 1 ft) is true foi. 4 a j, it is truo for i ** j + J, Thus 
by mathematical induction (5 1,9) holds good for all integral valuta of i. 

It ia also clear from (5 1 6) and (5 1.9) that tf>, can bo expressed tva a poly* 
nomial m fa of the tth degree, the coefficient of ^ being equal to unity, 

To complete the proof of our assertion we will now piove that 

(5.111) £?($, * ^,) « 0, i ={= j 
From (5,19) 

(5.112) fa fa = 0 l+t -f lix fa + ^iLZ_y Zt -f fa^ u 

where i la any positive integer. We multiply both sides of (S 1 12) by *, nod 
reduce every product M, to a linear combination of 4,+i, <f>, and <jS w with Hid 
help of (5 112) Repeating tins piocesal - 1 times ( } < *) H follows tlinti 

ij'- y 

(5113) <f>lfa =* fa h + g d’ u fa^ v + di r fa^ 

SSAS;^ ot K ' 21 and *» • Fram (6 113), by taking expectations 


* <£}) « 0, i ={= j 


(5 1 , 14 ) 


4> i) == 0, 


0 < *>. 



VARIANCE OF ESTIMATES 


23 


Now, sinceis a polynomial of the jth degiee in 0i wc conclude that (5 1 11) 
is tine foi all integial (positive) values of i 
Thus we obtain 

(5 115) UN) - l,v. UN), UN ), ■ t UN), , 
as a set of otdiagonal polynomials in 0i(N), tho weight function being 

Vn(^,X 2, ’ ,tn,0) 

Furthci moie 

2{-3 

(5 1 16) 0i 1 <if>i = 02i~i "T d'u 1 0am-u -f- filiij 0i 

U-l 

where 




24 


a, II SETH 


A, a nd B,, the coefficient, of *, nod *,-■ reopeot.voiy m (5.1.12) for tho nbovo 

four cases are given as below 1 
A, & 

10 iN 

i(t — 1) , iN 
2 2 \jb ^ “r rj02 


t(l - 29) ~ 1) i 

-- . - „ / - m ~ 


tJV 


0(1 - 9) 9(1 - 9) d(l - 9) 

4 l/S iW/9 

It may be mentioned that in all these eaflea [0,J nro also n cornpioto aofc of 

polynomials 

§5 2. Let 1 &*.(»), where KJi = 1,2, • , vt) depends upon 0 ho such that 


(-1 


2 if ,^,( 71 ) and satisfy the legulauty conditions mentioned in §2 2 Then 


1-1 


wa will show that £ Ki^t(n) cannot be a function of , is, 11 i^n aluno nxoopt 

•-I 

for constant zero 


m 


Proof i Let us assume that £ & ' is independent of 0, that in, it is 

i-L 

some statistic, say, 

{5,21) 0*(Xl,Xa,' ' , OJn) ® 


f-1 


Taking expectations of both the sides, we obtain 

m 

(5 2 2 ) E{&*{x x ,xi, * ', #„)) = £ Kj E<tu{n) « 0 , 

Differentiating (5 2 2 ) i times with legard to 0 , wo have, because of the rcgip 
lanty conditions on $,(n) and 0 ’* < (x l , 13 , ’ , x n ) f 

(52 3) jE(0*($i > j , £ n ) ^,( 71 )] =*» 0, v => 1, 2 r . ■ • , m. 

It may be noted this js similar to the result m (2 3) From (5 2,3) and (5 2.1) 
it follows that 

(5.2 4) $t**(*i, £ 1 , - , Jh)f *=■ 0 

Thus fl + (cc 1 , z ? , , x A ) is zcio with probability one, that is, 

m 

£ K, ^(n), 

1-1 

if independent of 6, is zero with probability one Tins provos our assertion 
that this cannot be a function of a,i ( x a , * • , x n alone except for constant zero 
From the foregoing we deduce the following conclusions 
I or any power of it cannot be a function of the observations freo of 0. 



VARIANCE OF LbUMVTLh 


25 


II If a statistic 0*(xi, x 2 , , x„), which is nob a constant with probability 

one. can be put in the foim 

in 

(5 2 5) 0*(x j, ij, * , £„) = K 0 + X) Ki 0i(n)i 

where m is some finite positive integer, then 
(i) K o must depend upon 0, 

(ji) TJic cxpiession (5 2 5) for 0*(x x , h , - , t„) in 0 ,( 71 ) is unique. 

(in) No othoi unbiased estimate of K a satisfying Lho icgularity powhUons 
can be put m the foim (5 2,5) 

(iv) When m = 1, theic is no othei statistic except aO* -|- 5, where a and 5 
aie constants independent of 0, which can be pub in the above form 
Ko + Ki 4>i(n), K 0 and Ki aie dilTcienliable functions of 0 and Ah docs 
nob vanish for any 9 undci consideration 

(v) Let £ be any function of xj, ia, , x* fiec of 5, satisfying the lcgulanty 

conditions of §2 2 with /£(£) = 0 Since the luminance between £ and 
0*(£i, x 2 , ■ , "Cn) in (5 2 5) is equal to zero, the statistic of the form 

(5 2 5) has the least vniinnco of all unbiased estimates of I(a that saLisfy 
the icgulftiiby conditions of §2 2 

Also, if the probability density or the piubabiliLy function depends on more 
than one parameter then all the above results except (iv) hold good if 

t k, *. (») 

is replaced by 

0iiii]> 

11+ • a !■ +ir^Mi 

§5 3 Let us now prove the assertion made in (iv) of §5 2, when in is equal to 
one 

Suppose the contrary that there is a statistic 0i*(xi xi, * • , &„) v, Inch is of 
the form 

(5 3 1) (£ 1 , , , x n ) = U + la * 0j{n), 

0*(xi, , , x n ), of couTBe, has the form 

(5 3 2) 0*(£i, xi , , x„) ** Ko 4“ K\ * 0i(n)f 

Wo will assume Ko, Ki , Ao, A 1 to be differentiable functions of 0 and that 
Ki , L\ do not vanish for valueB of 0 under consideration 
Differentiating, with respect to 0, the expressions in (5 3 i) and (5.3,2 , we 
have 

(5 3 3) ^ ' 0i + Aj( 0 a — 0 ?) = 0, 

(5 3 4) c ~ + ^ 0i + Ki(0j — 0i) =* 0, 



26 


0 R SLTfE 


where 0, is ahoifc for Taking the expectation's of the uliov e and rearrang¬ 

ing, it follows that 


(5 3,5) 


~ u do Ki e10 ' 


From (5 3 3) to (5 3 5), we deduce that 

Mn i dh i m 

<5S6) 

Now solving tE\e above differentia] equation, we get 
(5 3 7) U ~ aKi , 

where a is a constant independent of 0. From (5 3 5) and (5 3,7) it follows Uml 
(5,3 8) Z/o — uKq T t, 

where b m a constant independent of 0 Fiom (5.3 7) mul (o 3 H)\\r (mmMiuIp 
that the statistic in (5 3 I) must be of the form a fl + T {j, which proven mu nstmr- 
fcion An immediate con&equenco is that if there exists an o film mil Hint min' for 
estimating t( 0), then no other function of 0 except iv y(0) -| 5 call have nil Wh- 
cient estimate, 3 

§5 4, If 0 %h t Xi , - , a„) ia an unbiased estimate of 0 satisfying the follow - 

ing conditions 

(i) Among all unbiased estimates of fl having finite variance^ which arc ulou 
functions of 0* 0* is one withtho least variance, 

(n) Foi all 8 them exists a completo aet of polynomials with respect to the dU^ 
tnbufcion function of 0*, than fclioro exists no unbiased estimate of 0 With 
a variance, which is functionally dependent upon 0*, oxiopl 0* . 

Proof ► Let 0* be the unbiased estimate of 0 which litis Urn leant vnrimiro 
among all unbiased estimates of 0 which are functions of 0* Further lot 8(d*\ 
be any function af <?* free of 0, whoso expectation exists ami i* equal to 
Let the variance of S(l}*) be finite It is well known that for any « U gIi S(q*) 

t 54 ’ 1 ) #(0*S(0*)) =* 0 

Now 8*8(0*) m turn having expectation equal to zero, wo obtain 
( 642 ) E(6**S(8*)) = 0. 

Repeating the above % times we obtain, in general, that 
^ 643 ) E[0*'S(Q*)) « o 

J We aeaunte the exigence of ^ (i * I 2) mhI - tpJ\ rniT „iu , , 
j do ' ‘ ] rotT ftl1 #. mid also pud inline (hot 

IT md ^ d0 not Vtt «'sh for any 8 under cor.fl U lcrat.oi, 



VARIANCE OF LWIMAfLfl 


27 


for all positive mtegcis \ Fioin the above, with the help of condition (n), 
we conclude that 5(0*) must be equal to zcio* Thus if /i(0*) is an nnbmed 
estimate of 0 with finite vanance, then fiom above, 11(0*) - 0*, having the ex¬ 
pectation zero and a finite variance, must be zcio with piobnlnlity one Thus 
11(0*) is the same as 0*, which pioves the mil 

Example, If 0* ib of the fmm (527) and condition (ii) is Batisficd, then 
theie is no fniiclion of 0*, fiee of 0 and having a finite vaiianco, whoso expec¬ 
tation is Ko 

Conditions (i) and (n) above mo satisfied foi estimating 0 in the examples 
quoted at the end of the section 51, and thus in these cases the lesult holds 
good when 0* is the efficient estimate 

I am highly thankful to Piofessov J Wolfowitz for his guidance and help m 
this research, 


REFERENCES 

11] II CiUMiiq i) Mmattcal Methods of 5to(tc$, PiniLclon Umv lhcsa, 1910, p 480 

12) C 31 Had, “Intoinmtion and Die accmaoy attainable in tha estimation of statistical 

paraiiiol,crfl > 5> Ca/cal/a ilia/A jSoc Uni/ , Soptamboi, 1046 
|3] A, RiiAmcjiAavyA, “On somo analogues of the amount of information and thoir ubo 
in statistical estimation/’Vol 8 (11)10), also "On some analogues of the 
amount of information and tlioir uso in statistical oBtimiAjon, 11 NankAyA, Vol, 8 
(1017) 

[4) II, CiuMtn, "Contubuti m to tho theory of statistical estimation/’ /Slfliulmauifl/p 
AUuar hris, VdI 20 (1010), pp 5-04, 

[fi| J Wolfowhz, “Ellicicncy of sequential estimates/ 1 Annals oj Malfv 5faf,Vol, 18 (1047) 
jfl] UiAcutfEUi and Girbiudx, “A lower bound for tho vauanDo of some unbiased sequen¬ 
tial C8timatCB, M Annals of jtfalli iSiat, Vol 18 (1047) 

[7) II O, Koopman, “On distributions admitting a sufficient statistic/ 1 Am Wli Soc, 
2'rcniB, Vol 30 (1030), p 300 



ON THE THEORY OF SOME NON-PARA METRIC II YPOTHKShS 
By 13 L LbiiMVtW am> C >rns 
Umvemly of California , llrrkrlry 

Summary Foi two types of non-piunmeiriY li\|KilUr>H * u *K 

are derived against ccitain classes of nltcnmin es 'Urn l*u kind* of livp-dluw 
nre related and may be illustrated by Uto Mowing simple Ui I li<‘ jmnt 
distribution of the vaimbles X, , < ■ , X* , V \, , l r - w m\.iruiu muU-r all 

peimutations of the variables, (2) the variables arc mdopvmlnillv nml hUmUhmIIv 
distributed It is shown that the theory of optimum Icsls for hj, |mllw^s nf Uio 
first kind is the same as that of optimum similar Iwrt9 for hypnlhr#--* of Urn 
second lend Most powerful tests arc obtained Against Arbitrary lomplo nUrnm- 
tives, and in a number ot important cases most tfniiRWil nrr rinnvrf 
against ceitain composite alternatives, For the example (1), if the dittnliiHluW! 
are restricted to probability donsities, Pitman’s test bused on f} * f m moal- 
powerful against the alternatives that theX'aand F'a arc mdrprmlmitly normally 
distributed with common variance, and that f3(Xf) f, LOM »j nrliero 
7 j > { IF j? — f may bo positive or negative the test IihwhI on If? <* / * i* m*w» 
stringent. The definitions are sufficiently general that the theory njjjjlirn 1+) 
both continuous and diacictc problems, and that tied obnerviitinii# prwnl no 
difficulties It is shown that continuous and discrete problems nwv U* com¬ 
bined Pitman’s teat foi example, when applied to rerlAin tluH*fotp probhmi*, 
coincides with Fisher's bxucL teal, and when m » n the U^L Inur'd on } ff in 
most stnngent for hypothesis (1) againsl a broad cIukh uf idU'nmfivi a idndi 
inaludes both discrete and absolutely continuous dmlribulitins 

1, Generalities In the present paper wo study the problem of dt-lt nmiiiii# 
optimum teats for certain non-paiametuc hypotheses IL is imjmrUiif m Urns 
connection to make some distinctions which are of lesser Higmlunrim wlmn llm 
problem is appioachedfioni the intuitive point of view which him berm ounlonmry 
in this field, Consider for example tho hypothesis U tliul Z, , - , nre 
independently and identically distubutcd accoiding to an unknown probability 
density function All tests which have been suggested for tilling // nrr* \alid 
also for testing the hypothesis XV that tho unknown joint probability dnmotv’ 
function of the Z's is symmetric in its N argument, On tho oilier limid, Wt\# 
which have optimum proposes for testing IV against a < crUtin Ha** of 
lives wfil m general not possess tho same piopoilu* when IV ih ivplnml bv It 
From the present point of view Lho Uvo hypotheses mentioned nrr* mwiihrdlv 
ifterent We shall be concoined in this papci primarily with geimrulitmiioim 
ol a , and we shall show that many of tho tests suggested m the liivniLure luivo 

~r 6rheShT teStmg hyp ° Lll0aCS of thl9 kl,ul curtain cW« of 

The corresponding general theory for hypotheses related to II is ipnto dilTerent, 

28 



sovuumu rim mrmui sm 


29 


JI„\m vcr I In luu Ih'mrn dn n>iiindr\ prm ided tests of those lattei hypotheses 
i,rr r^l.MnJ to miiuI ir n gum" More opmlii nlly, all results on optimum 
ItMis id IV tin* rMpmtih*ni (n Hm, iim*M|)finfiitiK Jesuits on optimum siinilnr tests 
f .f II, and this Hpnvrilem r- holds tl*-ri lor ninny of the moie general hypotheses 
eolith nd m Bun paper 


ll fdmuld be ol^-rvifl tlml m mum experimental situations, the hypothesis 
IV (lint tin 1 hunt ditdnbiilmn of the /Fh is mvurnmt mulct nil pci mutations is 
iiu'f' 1 ri idi**iv limn 'he h\ poUiesis II lliut the #’s are independently and ldcnti- 
m)U disinlmlid For example, unppnHe there is a block of land divided into 
m h plulH, and lUr* experimenter wants In test whether one of two feitilizers 
(liner! m lixed nmminM is more efTcehvo Ilian the other m iiiciensing the yield 
of ii certain plant Of the plots, m are < hosen at random, fertilizer I is applied 
in dim*, nml fertiliser fl to the other n If A r , denotes the yield fiom the zth 
plot to ulmli frrtili&cr I has been nppherl and F, denotes tho yield fiom thejth 
plot to \Uiah fertiliser If hns been applied, uheie the plots arc uumbeiod at 
random, then the lupnthem Unit tho Luo fertilisers ure completely equivalent 
implies ll ml the npphcalmii of unv permutation to A'i, • ,X mt Fl , * Y n 

tines not change (heir jmnl distribution Hut vt is not lcuflonublo to suppose the 
A\, Vi are independently and ldeniienlly distributed, ainco Llicro inay be mtnnsic 
di(Terenn*fi among (he plots For diseuHsionn of these nntl related points, see 
Fisher |1|, Nevmnii (2), Hitman, (3] It may be that in many particular cases 
wane liypolhem l>Kw mi the la o m really appropriate but tho hypothesis H is the 
only unto llmt ih evidently uppmpmlo from a cursory inspection of the setup. 

Mirny of the a! la run live hypotheses considered below, for example those 
involving normality, arc dictated more by tradition and ease of tieutment than 
by appropriateness in ueluul experiments Thus Lins paper should not be 
eminulered «« providing absolute justification for tests such as Pitman’s but 
rulhei as augRt^img a melhotl of obtaining optimum non-pmamotric tests when 
(he class of alternatives ih fairly well Hpecificd. 

Another possibility, first raised by Noyman [2], which has been ignored m this 
paper is the equality on the average of tho Lwo feitilizeis bub with fertilizer I 
having larger dispersion than fertilizer II, or a distribution differing in some 
other cliaruelenatie, It Mould be reasonable to consider this ns part of the 
hypothecs tented, but tents based on randomization may givo a probability of 
rejection of the Iiypotherne of equivalence in tins case which is much higher than 
tho slated level of Biguificaneo, Wo hope to totiirn to problems of this type m 
later papers 

Uil u« mahe tho following basic assumptions. % is a space of points z and C? 
w an additive class of subsets /I of %. Any member of tf will bo said to be 
measurable By a probability distribution wo moan a measure F, defined over 
tf for which F(Z) ^ 1. We shall bo concerned with two classes of probability 
distributions, One, the class of all distributions, and two, the class of distribu¬ 
tions which are absolutely continuous with respect to a given measure n, that is, 
the class of distributions F for which there exists a function / such that 



30 


E L, LEHMANN AND C Ml IV 


( 1 . 1 ) FW = f /(*) M £ ) 

•>A 

■We shall call / a generalized probability density function w dh r t>» ^ lu 
2 we denote a inndom vauable such that for any 1 in Li t 


( 12 ) 


P\ZtA] = F{<4) 


Foi most of the applications wc sliull take % Lu he a Kui'hdrm ^|v» i\ and it 
to be the class of oil Boiel sets Then if g is Lebeigile meiisurn, (I 11 M Pi s (|i if 
/is a piobabdity density function in the UBiml sense Ilimcwor, wc “ball )m\n 
occasion to'considci also some measures oilier than Bohcngue measure )iv a 
hypothesis H we mean a class of piobabilily tlisliibuLions Next \u* di ^ nW 
the hypotheses with which we shall be concerned I,cl II bo n [larhturu id 
that is, let H be a class of mutually exclusive suh«elH H or Z hUrli llml huv 
point z of % lies in one of the sets & If two pomU zi and z 2 ho m llio *,uno * 1 1 
ive shall say that zy is equivalent lo z 2 with icspeel to II. zj ~ ? 4 (mud 11 * 'j j l(l 
set of all points which me equivalent to 2 will bo denoted by T(*), the numlu r *-f 
points of T(z) by n{z) Concerning II wc make the follow mg n^mnpliuruu 

( I ) All sets in IT are finite, so that n{z) is finite for all 2 

( II ) If we define S n as the union of nil those sots S of If winch unit am ijy « 
points, there exist mutually exclusivo sets ,Sj l \ , 1 SV 0 whuh an* miiomruhln 
and such that every element S of II containing exactly 11 jxuuU Urn uno and 
only one point m common With each 55° 


AVe shall say that a measure g is invariant under II if the following 

holds For all n and < n t if S is any set contained m A'l/’ and if *V dmiui^ 

the set of equivalent points m S { J\ then /i(&) *= ^(.S J ) 

,, G /^ n ^ p&r , tltlon ^sfllJsfyjng (i) and (n), we furmulaU* iho liv|Mj|| JWffl // 
that the distribution F of Z is invariant under II We shrill rofoM.i // ah thn 
hypothesis of invariance undei U Wo shall also consider tlm ImwHliwm «i 
invariance undei a partition for a class of gemmdized densHim / ‘in tljm <•«** 
we assume tot to measuro „ of (U) „ glve(lj lltld (lmt „ J U ' T, 
and ( 11 ) satisfies the condition } 

tot 2 T"7 m n Th ° »m u,„ 

states that & (mod II) implies j{sy) ** /(^) 

totomte/'l 11 T mean (soe a MwumWo fm... v 

of rej 1 m If P °' nl Z * S " MUUI “ m 11 ... "tl 

of the difficulties eucouutered bv Sc lm tl m™’ U <‘l«* 

-lar regl o OS , as W h“e to^, n Ier m l ! l l “* ° f ^ 

The size of a teat y> 13 defined to bo 


0.3) 


e to>) = sup f <p{z) dF{z) 



rvitunriMf icvrt*riri-s 


31 


If in p irfn »d nr 

(in j + rU'*-t1v) 


f M r 'ill r im U. v 1 * mvI l»i l*o "irwlnr fur tiling // huomlmg Llio terminology of 
H<hefhS wr '-iv that ^ Ini* *1 mature M«l if for all z m iS„ 

(I r i) X v'(i') ^ w 

* *t riij 


The follow mg lemma r Uriah n rt-^uli of Hi Ilo(T<5 

I rviiJA 1 /V I'din# n AypofArm of imarmnce, any test of structure S(t) 
r a mm\lar nud of ^is-p <„ 

Ihamv Fur any F in // mid any ^ 


(I H) J ?tlF «« X X f y>dF £ / [ X) ^(s')] dF(z) 

J! e,**} i»<j *i«L Jj** 4 'iT( 4) 

Hot 1i *# ‘'(nit Sure M*J and lienee (I 5) holds for all z Thcicfoie 


0 ?) 



^ P 

XI «< / rii (IF ** <. 


We fliall *hnv» nr\l Mini for letting a hypothesis of jtivnriunce al level of 
wgmlnniirr c 4 only iwH of Mnutura .S*(i) need Ik* considered. In older to make 
iIih rrwill np|ilirnhh' Imlh to hypotheses referring to the class of all distributions 
mid Ui llnwse refilling to a rliws of KonmliKod densities, we shall state it m an 
n*fvrnmrlrir form which when taken together with lemma 1 indicates the essential 
CHimwilrurr* of the two ly|«*n of hypothec 

Ii^uma 2 If & u any ted of ri hypothesis of invariance for the class of generalized 
dfmUm u ilh rc^pcti la a JixM measure h, and if the size of yj is less than or equal to 6, 
then there re id# a ttM & of drutlure S («) such that 

(U) / w rff’ >: / V- <1F 


for all probability (hdnbutim* F 
Proof. First wo shall show Hint 


(MO 


l, L *>(*') < • 

ri(z) j'lrio 


nlrntwit everywhere 14 . For let A he the set of points 2 such that 

(MO) * Xr V 5 ^) ^ * 

rKzj i 1 #rct> 

tuul supikjm that u(A)m positive Let 


0 . 11 ) 




if z«A t 
elsewhere. 



VI 


Ei I, LblFMANN *** ,N 

, t wWMvr i w U1 l. 1 in (“ill *i<hM 
Then ] is in H since by definition of A, *fcw*« ' ' 

in A, But , 

(112) /rf*. >*/,/*-• i 

m contradiction to the ~tnm ^ “ * „ f .,.I 

From (1 9) it follows easily that thwft CNIRlH 11 w 
such that for all s 

1 .. 

+pr^ Tins Question is nnsweicd by , 

1 unt.amende«|M«rr« I* Hit .«4 U- >^, 1 , 0 ,n 

oft satisfy,*, conditions (.), ft) and (m), ««f «(* * '■ 2 ""’' l '"f 

Z n. J (mod lb). For the (tea 0/ (jciicraiizcd dnisilirs irHA rryrtl la n ilnmlr ,, 

ff i ( t ». 0, 1 ) rtfl hypothesis of tnvarmcc rc/flfur tv 11, Thr* fm Muifl Ih 
aooinsi Hi at ZereZ 0 / significant <* the totality of t^h uhtrh (a) hms atrnrturt 
BU) t and jor which (b) 2 - z' (mod II,) imphr* ^ /° rm 

complete class of adms&ibU tests, 

Phogf It is easily seen that we ran restrict iiwm'Iws In llml nl 

tests of stiucture tf(«) which possess properly (hh l'Vir if y 1 * mv U^l u( fiimt* 
ture jS(c) relative to Hu, let 

(!W) 

Then clearly y* possesses property (b) and has Hluu lure »S(tJ biirllMUHiiin* if / 
is any probability density function of 1! 1 , then 


(1.15) 


f v*f dn = f yf du, 


so that tp and y* have identical power against Hi. 

In older to complete the pi oof, wo must show IhtU if v», nml w lire any Iwu IttnW 
satisfying (a) and (b), and if v>, and & differ on u m*t of jviirtilive Uiw 

exists a piobabihty density function / of ll\ fur which 

( 116 ) | <pif dp > I frjdp. 

Since both yi and <p 2 have structuro H(e), the sclH of points 2 for ivlm h 

(1 1 7 ) <pi(z) > M z ) 

has positive measure. Also, because 0 / (b), if two points are fqim a lent irlntivc 
to H], they aie either both in A or both not in A If f(z) m tkihpwl ah \ JiilA) 
for z in A and as zero elsewhere, then f is m Hi and satisfies (I 10) 



Mrs VUlWlMIUi mi'mjn-MS 


33 


Um 1 flu mu*iu fdi? mod from tin i*r< in 1 bv lilting the hypotheses // 0 and Ih 
rcf< r t»* Hi** 'll 114 nf .41 j*r^1 diMribnhons rather tluin to a particulur class 

nt gm*Toli^d flniHiif % i*ub ,jv!v n!^> inie, ami tnm»Hbetween ih^sc two theorems 

(<IIIM ,'d «> i r h'lmukUcd 

.‘’liar lie n*'i^ powerful ^ for tenting a lij(jotlipnm of invariance //o 
refining 'o n • I r •« of gr-inrdi^d #tr iihiiipk nR'mihl an alternative / from this 
cla*"* nf rh n**iin -* )h\ * Hu* iiirrc I Mitr nlno for testing Uio wider hypothesis /?$ 
rrfrmmt Jo flie i1 im nf ill di^lrilmUHii*, $ ih hIku must powerful for testing /?o 
ugiimM / 'lie- i mrr ^Hiding u in irk hold** for inont utringent testa, Therefore 
all uphmnm Hint will l»e d< mid in die sequel, through the use of thcoroma 
nf (Iih *eUn.n, inn\ In* rmiHibred an feats of hypotlic^cn referring to the class 
nf nil dif+lriliuiuai^ tin v \x( valid rigaiinl lliene h\polhescH, and no power is 
gmwd Ipi n atm hug Urn bjjjoilirHW In the appropriate dassof geticializcd 
rlrnsil ir* 

2. Most powerful tests and most stringent tests. One of the main problems 
to Ih‘ mnrndi nd hi (Ins pa]nr ih the di lenninnlion of a most powerful lest of a 
liypnllir*w ml iiivarmiui against u iimpli' alternative If we restrict our con- 
MdernlmitH to tin* ♦ Iijhh of giiirrailiml dcimities with respect to p, a complete 
aulution of iln* problem h given by flu* following 

TiiMdir.u 2 /W JJ hr thr hyiwthfn* nf umirwricc under the parhlwn If, and 
lei q hr a proMohiy dtntttfy junction ml in // For any i in $„ denote by z (l) , • * • , 
t !hl thr « /wmifi of 74*1 arranged m that f/(s nl ) > g(z° } ) > - * > 0 (z fn) ). For 

tenting 1! against y u n\mf jmnrful tent oj sue t n? piren by 



f l 

•/ 

g{t) > q(*" ..)' 


(2 l) ^fiEi ■« 

i n 

i/ 

SU) - »(*'."’) 

» for z in , 


io 

<i 

fffrt 


where ]C ^ m, 0 

< 

a < 

l and ic/icrc a may depend on z through T(z) 


I | 

Pnoor First w © otatm c that the number of i U) for which g(z {<) ) > g(z {L H< n]> ) 
is greater (lum nr equal to l+[«n) > m and that the number of z for which 
p(£ !il ) > in 1 c*h than or equal to [«n] ^ m, so that thoro exists an a 

between CJ and I fur which ® n< Since $ has structure S(e), it follows 

from Itmiiui 1 that if ih similar and of mho t. 

IfOt 

(2 2) y*(x) - ff{£ t, * ,,B|1 ) for *«&. 

Tu t'uiii|)lele the protif cnnaider firHt the upocinl case that 

j p*(«) 


(2 3) 

vanishes Tlion 
( 2 * 1 ) 


ttff dp “ / p dy. ™ 1 



34 


L h LLIJMXSN vs« i Hrt (\ 


that is, the test <p has power h and Lliercfmr H rlr'arlv ru r M 3* J)h,< rSi?S A-unm 
next that the integml (2,3) is positive Then g* i* pioimrii'»in! !■»,> jipiImIoIjS 
density function of// For it is mciiaurnble ithd NiUdii ^ iln< in Muninum 

lequiied of a membei of //, and the integral (2 3l ii linn*' 1 muh* 

E f g*(i) dp («) < T, — / S <j(J‘ I <M‘l 

(25) " h ’ " mK 

5 X / X 0 s" i diDzl -■> * 

£ »i J It | 

The test y therefore has tho form of a probalhlilv raliu ii»M Nmi* it n j|l%* 
similar, it follows fiom theorem 1 of [lj Uuil v> is must ^m< rful 
In practice one is usually mteiealed n> coiniwuUi rnlh*T ill ui Minjilr a|l< ni ^ 
tiv^es We shall theiefoie consider next the problem nf ilrrn ihg nv^i Mruppint 
testa of hypotheses of liivmiuiicc agnmsL cerium id alM-ni itnn * dlin 

problem may be reduced to llmt of finding te>,is vdiivli imvinui* (hi* mmmiuin 
power over a class of alternatives by the follmvitiK Simple tlieumii *d Hum *mit 
Stem [7J 

Theorem 3 Given a hypothesis II and a dm of aUrrmiun lg?\, & 1 1?, *Inu*lr 
by the envelope power function corresponding to (hr l n d of mgmdmtirs 4 
f/ini is, hi ' * 

(2 6) 0*(0) sup/3(p, 0) 

* 

where p(<p, 0 ) stands for the power of the ksl v> agaimi thr nkrrmiur g t wul whrrr 

the least upper bound is tahen over all teals of size 1 hi Jfh| h n eh*-n( mnUi 

alk1 exclusive subsets 0/ fl such that Ufa a ft mid such that $*10$ n inwrUtui tm 

each Qi , Denote by & a test which in<mmi«c< f he irimummi jxan r n <r \* t H 

<p t - <p is independent of 5, then? is most stnngm l Jor tenting II tunturl U n* inri of 
significance e 

For obtaining testa winch maximize the minimum power nwr n tU™ ,.f 

e“Tf I'em2 to3 °* ’” Vr ‘ rla " Ce ' « cim * ul * «>'- '"»■*'"* ..* 

Theorem 4 Let fj i c a hypothesis of nwanaucr, and irl ft. fc- the , f„ , 
tsUernahoe, {g,} s , a Suppose there exists a subsit It' of u a wl „ 
measure X over « suck that for the test r of « . <ll[M „ ,[ ° 

(27) 


»W " f (lib) d\(S), 


the integraf J ^ dn is constant for 0 in f i\ 


and 


(2 8 ) 
Then 


J VOt dp > J <r>Q 9 , dp for (<ii 0 t (j ( (f , ip 

rmanmmsjie minimum power over ft a l kid of s m ,fim 


nrr t 


enrelope Pw and^t.halTs'Tr!!'-X“ 



sovruuM*nur lmoimsis 


35 


TiiiiM* JU tlif tin in 2, ^ ih n moht powerful test for bating 11 against g, 
thfll I 1 *, b'l Illl> v*' 'if “l/C t 

W ’U / «, '* ") [ r <M") MZ) < I yr(s) yi (3) f/ M ( 2 ) 

C <in*nN|iu hi fv 

Ilif [ dfifzj r ' [ fls(0) f v'{z)tji(z) da (2) 

*»» J JlT' J 

(2 M) * J ^'( 2 ) f b(z) OiU) 1 1\(6) g J v?(e) t/aO?) ^ ^( 2 ) d\(0) 

** / <to(P) I ‘f&JfftU) dn(z) “ inf f v (z)q,(z) dn(z), 

«• 0i(i J 

3i Normal allcrnalives, Lcl // bo (ho hypothesis of mvarmuce under IT, lol 
T(sd \w\\w wd of iminK t*t|vn\ iiImil lo z (mod U), and let/ and g bo two functions 
dHimd n\ (r £ We elmll ante / ~ g if (here ovists a function F such that 

(3 0 /<*) - FbW, T(z)], 

^ hr'rc fur uii\ lived I'ts), F jh n si nelly increasing function of g, We note that 
/ — g in On' following two spei ml eimos' 

(M /( &) m b\giv] where F ih slriolly mrrenflingj 

(m /Us « «f s)g(z\ *V l/U) whore ate) > D for all z, and where zi ~ 2 j (mod n) 
impliesa<£jt 1 fKsihHzil //fas) 

The U'W'fuliu^n of IhiH notittum stoma from tha following remark. Lot 0 * and 
1/c ilelipi' d im m (2 2) jukI (2,1) respectively and let / ~ g, If the test ^ is 
oblnmcd from ^ by wibsliliilmg/owl /* for g and g* respectively, then ^ «= <p. 

TJie puriHw* of ihe present section m to obtain moHfc powerful and most 
slruigmi lesiH of some hypotheses of mvarianeo against certain clauses of normal 
aUernriuves In particular, problems will bo exhibited for which variouB 
non-pnramclric u^Im suggested in the literature possess these optimum properties 
PuniiKVM L Suppose that Lite random variables Z\ } {j =* 1, * - , S{ ; 
1 ■» 1, ' * , hi) have a joint jjrtibiibiliLy density function, and denote by 11 the 
hypothecs Uinl Ibis probability density m invariant under all peimutations 
of Urn si argument'* within the lilt gioup for 1 « 1, ■■ , in Consider the 
nllcnmlivc U t tlial nil variables are iudepondGutly distributed with common 
variance o 9 , and that 

(3 2) E(Zu) « &xu + b f , 

where a, Lhc i/'s and the x'a are assumed know and whore, without essential 
Joss of generality, we assume a > 0. Assume further that 

«r 

(3 3) 2 03 0. 

/-i 



36 


E I, LfcllMLViY VND f 'ifHN 


In order to obtain the most pond fill test of //against //i, no up)i1\ llimt'm 2 
with 


(3 4) 


g(z) = g evp 


~ 2^2 ~ 


j 




> \ r ■> 

r r. ‘ri J 


n -m , 

*V-W J 


The most powerful test is therefore given by (2 1), if \\e roplai (* qii J liv 2.2,^,, p tt 
This test being independent of <r 2 , the b‘s and a > 0, it is uniformly mimt pint infill 
against the class of alternatives obtained from //i by not specifying (lu* valium uf 
these parameter but lestiicting a to be posilh e 
If we drop the lestuction a > 0, a uniformly most powerful lost m lunger 
exists, we shall instead obtain the most stimgent test nguinsL ibis o\U*iidM i I^m 
of alternatives, using theorems S and 4 Clearly the envelope power fimvticin n 
constant on the surfaces la |/ir z = constant- Tuko as the fj of (Imorrin J, Min 
set consisting of the two points (a } b lt , a) and (—In , ( b n , * 

Let X assign the probability ^ to each of the tv,o poiuta Than Urn fimrlui 
of (2 7) becomes 

1 SSfc,, - ax,, 


a) 
I'Hi p 


i/ i Y“ 

5WS5; ex P 


.w 


(3 5) 


■wj+Kvkr 

wpf-^rsfc, + o*.,- hj 


i? 

~cxpi^ f/ (flx</+ b,)} + expiSS^C-a-r., 4* b,)J 
~ exp(2SaaJ w z if | + eviif-ESai^,] ^ j } 

The power of the test <p obtained by substituting this explosion for tf in P4 |) 
is the same at both points of For this test is most powerful fur (uslum // 
against the simple alternatives W that the density of the JSVi is giv on bt Him fir^i 
member of (3.5), But under the tiausform&tion Z' f j = -Z u 4 - 2 b t // nmi //' 
ond therefore the test «• me left invariant, wlnlo the two points of 0 nre itermutcl 
Condition (28) of theoiem 4 is thcrcfo.e satisfied, and hence * inaxmiieee the 
“ m power over B Since fmthe.moro , n independent of the l t ! r 

set fl chosen it follows flora tlieo.om 3 that „ is molt stimuctii for ihl proldnm 

Then =t; nd izT^CTP “ "°‘ *"" d * tel *' “ '■* * ■ 

Therefore the test cufceuon (3 5) becomes 

(S3) | 22*,,(x„ - s,)| . | 22 ( 11 ,, - z, )(,„ - * f )|. 

Some special ease* of ptoblera 1 rao of pmtieuhn interest. 

a) Suppose that the variables of the %t\\ aroun full mi.t <„ 1 

write for Z. } {/.. w ) ien , = i , 8 ?. up 11 nto Uv ° “uburcmp*, and 

*, + 1(4, + 1, = j,) Let ' "- 1 ' whmj “ I I, • , 

(3 7) Xll J° f0r i“l. 

[l for j = fc,+ l, , U. + /, 



Nt>*i i'stnMhnin im'orim} 1 ! 


37 


Thru flu* ulfcrjmhv nbe lo the \jiruihlcs normal distributions with common 
vuruiin c ami mu li tliat 


(3«) f «. b, , ^ \k -I- a 

Vhc 1 mtcrinn become** 


Cill) 


\\« f r 



or 

(3 10) 



j (L\ - M.) 

h 



(Ui “ ii.) 


ommlifiK a* « i 1 * riMrirlcd to positive values or not 

It) If no apeeinlizo Mill further mid let wi ^ 1, ue arc dealing with a pioblcm 
which would couuirh* mlh the two sample problem if wo added independence 
Ui tin 1 iiHMinijitiom of Hie hypulhesiH (3.10) becomes | v — 121„ the miteiion 
suggeM^I \tv I*ilmnn |dj 

i ) If iiih(uiu1 of m we wH K, e£j h ca L for i L, • , m we arc testing mtcr- 
(ImiiKioInhlv within iwh pair (it,, u,) against normal alternatives under which 
the menu* of l\ nud W art* difforml, the diffoienco being independent of i 
Tin* mtunnii ! I fi\ - «,) | to nhitb <3 10) reduces was first suggested by 
It A 1 min i* 111 

d| Ah u lust uNiimple nd m ■» 1 iu the ungimd problem Under the hypothesis 
lie* peril dcuiKily of Zi , . Z, ih symmetric in Us 8 arguments, while under the 

ultfriiuUM'H the '//$ arc nnrmnllv diKliibuLod with common variance and mean 
ox. 1 h J rhr rriHsnrin reduces U> 1 ^ (21 - 2)U. - £) 1 which was proposed by 
Pitman |3] 

We lliercfore that several non parumotrie tests which liave been discussed 
in Hit* hlirnlure urn most powerful one-sided nr moat sliingeut lor testing a 
liM'olbuhiH of iiiMirumee against eotlum Haases of normal alternatives, In a 
Inter Ht*< lion we Mmll indicate U> wlmL evlent these icsults lemtun valid if to 
ihcw hypothec we add the aasumplum of independence 

The remaining problems will be comudercd uomeuhat moie briefly since the 


proofa billow the name pattern ua m pniblom 1, 
hiouLhVt 2 The loihIiIiuiih id problem Id) arc satisfied in particulai if 
x,, ,x t tire values taken on by iiutdom vmmblos A’i, ■ , Xi and if under 

the alternative tho pairs ( X ,, Z,) have a common bivariate normal distribution 
w dli til •« a«. Wo are then concerned with n pioblcm related to that of testing 
for ubwnre of mlerclaas correlation For the corresponding intro-class pioblern, 
we i uiihuW random n anables X \, * * * , A« i Z\, 1 '' , Z,, and test the bypollicBis 
dial Lin* joint density of the 2s variables is symmetue in all its arguments, 
agmnsl the alternatives that tho pairs (X ,, Z r ) have a common bivmiate normal 
distribution, the means and variances of tho X'b and Z'b being tho same, We 



38 


E, L I.LUWWN AND h ^ 1N 


H 


shall only considei the cose of • *''liif' r,' wi'llui 

*.A« m the one Bided case a! , „„ 

ir P srr,t™ssitrr»i : , 1 ...y * r r: . 

lf„S by cons,dome all po^lc nays m .hrii . • '<•' '.. 

■fjinm flip POlTlDlPfcO S6fc of 2^ olj£>Gt*^ r fttlOHS< 

{ Z^m 8 Consider once more the hyp*]** Hurt U.* 1 H»> • "» "> ■■ 
2, Z, is symmetric m Us ft wgumunti, and rims«li' r V nll.rn i nv ,« ,-n 

the Z's m normally distribute! with iwlnc <Mod - 

(3 ll) ffW - <? »p|- 5 ? § tfe ' “ E) " i{x 

where= ft. The test based oil this cnlennil, which wim l>r*i|«H*d lev 
Wald and WolfowiU [81, is thciefere most po»ei(ul •W*in"> (| i" «W« •' "1 

Problem 4 As a last pioblcm, wo shall lest the In jiullic-ia II Mi,,l Mi" j»nu 
density of Z,, ,Z„ is. symmetne in Its ft iiryiiiiienls im«l ".winin'in iil«mt 

each cooidinftto hypeiplanc, that ,s, invaiiniil under the Inni-hiri. 

% [ = -x, lt ; = i, !oi ally ^ i, for i = 1, , n. T ina will In' U el'd '« 

the alternatives that the Z’s me mdepcmlflitly, Hlenln ivllv distributed so i "r<li<u> 
to anormaldistubution\utlinon-zeio mean, If wriediirLlhinitem t" |ci-i»n." 
values, we get 

■ i.,,i 


f)j*| 


(3,12) 


p(z) - 


(Vi 




2 ? -»/ 

» T ^ ’ 

If on the othei hand both posiln o and negative values me hHmu'd fur Dir- iiumii 
the most stringent test is based cm the siiiiHlu*» 2 Zi J 
This test may be flppiopnntc foi some situations in winch il h* iu»lnm'in 
use the sign test 




ii 




4 Binomial and other non-normal alteniaUvos, In the [inwat mhh»m 
we shall be concerned mainly with gonoial isutiony of problemw lb) iui*t M of 
section 3 As descubcd thctc, the hypotheses reforied to Die Hium of nil proba¬ 
bility densities m the usual sense, However, as \sus pointed out nl th<* rml of 
section 2, the. same tests mav bo considered aa 1 cfcinng lo uiUtdi wider hypul 
ses If they me interpreted m this way, \t ia possible to greatly wnli*n Dm < Infra 
of alteinatives without destroying the optimum puipoilics of the 
Let % = (A r i , , X n t Ti, , \\) and deiioLe by 11 the pfirlilmti under 

which two pomta z and z' aic equivalent if they mo obtniiniblq from nm'li oilier 
by a permutation of coord males, Lot Ha ba iho hypothesis of uwiiriunM* under 
It This is a genemhzation of the hypothesis of complete symmetry' NdVrnng 
to a class of probability densities Consider ns nltcmiitivo the elm uf dthtrdm- 
tions defined by 

(4.1) P(Z . A) = C exp |#|3*, + 0,ty y + lr(i.) + Zr(y.) | d^t) 



Nnvimtwunur JnnmtMM 


30 


where flic* 0 r- hiv nn\ nil I imuitM'iv, where n im I hr 2«lh power of any one climen 
Hull'll measure * fund Mien fore mmmiit, under III, und where r is any p-mcasur- 
nblr film linn, mihje^t nrih In the cimrlilnm (lint tlie integral (11) converges 
wilt ll Inluil nor (lie whole npnee 

Wy fini rnupiiler llie one ndrd mm 1 dj > 0| . lining llieui cm 2 for a particular 
0i, Oi , r it ml jo we (lint have 

{/(?) - f* exp 4 0j-t/, 4 2.K*,) 4«r(y,)| 

(4 2) ^8i2r t 4 ffjStf, - 4(0i 4- 0i)S[a!i 4- Vi\ 

- HOi “ 0j)1%, - Eyd ~ 2y, - Es, 

Since Hih le^i dors lint tlqwnd on 0, , 0j, r or ji, it is uniformly most powerful 
jigninat (lie one wided elftaK of alternatives 0i > 0\ 

Dropping (lu» rent ra turn 0i > 0», we apply theorem 4 with SI the set consisting 
of the two jkuiiM 9t, Ik ,t, h and &i, 6 t , r, ji At those two points the envelope 
power fmniion ulivioualy takes mi the Mine value If for X w r c select the 
diMnhuiinn, which iwigim eijunl probabilities to both points, then 

gf?) ^ iwp 4 4 exp (0 j£j^ 4 flii'iM 

■wxp (}(4 “* - 2ty,)| 4' exp (4(0: - 0i)[ilr, - 2f/,l} 

~ | Ixi - 2v< I ~ I fl ~ $ I 

The power of thin lent clearly is the same against belli points of fl Since 
furthermore (lie lepl doc* not depend on the U'a, r, or n, it is most stiingenb 
ngainst //» 

A umvnrial-r* dmlrdmlion such llint 
(4k I) /'IA’ < .11 « f Oxp (ffr 4 r(j?)l dv{$) 

■M 

has been railed I^plaeuin hj Tweedie [9], who has sLudicd these distributions 
in a different rormetiion. Among others, the normal and x> the binomial and 
Poifwm diBlribulumH are Lnplncmn To obtain, for example, the distribution 
of n elmrneleristie variable*, take for v the mensure r* which assigns to a sot D 
the vuluM 0 l or 2 according as D contains none, ouo or both of tho points 
x ■» 0 mul x «■ 1, mid lake ns density the function 

(4.0) p'(l - V)'"" ■> (1 - P)c“ 

For cm n pari so n with teals which have boon considered in tho litcraluro, one 
mn gppmhfta the problem just considered, so that tho hypothesis and the 
eta uf ntamlives fl\ consist only of those members of IL and U v which are 
ROncmliiBPil densities with respect to a fixed measure n One can specialize even 
furLhcr and Utko as alternative any subset of tt\ provided with any point 0i , 0a , r , 
it also contains the pomt 0 it & h r. Tho teat cloaily will not change with these 
fipcomlUalions, and the test based on (4.3) will therefore possess tho same 



40 


h U LtlnnisN AN*D < MMN' 


optimum piopcities with leaped to Ihe'sc *peml prnUw .is vntli r< ^p, *t U\ 
the pioblem for which it was originally derived 

Ii m pftiticuki one selects for>- the meanuic »■* mciUiuhrd almvr*, *«i«* nlihuun 
the pioblem for which II A Fishcv pmpoBcd the hut’ll on 1 1 3 h l! fnTIrma 
that this test, Fisher’s exact test, i*? most alruiKcrU in i (iimiwUmi with I ho 
following problem The random \ urmbk'i -Vi, " , A’-.. )\ , « , )\ 

characteristic vauablea, that is> they etui take on only llm valine 0 sind 1 If wp 
let (4,Q) f 3 (>Y 1 * Hi, ■ , yn “ J/nI = Nxi , ■ .'Ah Ilm hvpMln*tjs 

that the function P is invariant under all pcnnutnhnu** nf il« nrKunn MK \n 
equivalent formulation is that the piobulnhlv (<1 0) dtjiHitls miK mi t r, t ^ < fl t 
the total numbei of “successes 15 Fishci’« exact tosl H nuM Mmjtpmr mm hi hi 
the alternative that the A r5 s and Y'b tue sampJefl from luntlmiiml P'»juih»iiMi*s nf 
characteristic variables, that is, two populations lurriypnmlmK t*i dMiini 
probabilities of success 

Problem Ic) of section 3 can be extended quite* unnhigmialj Put n^un 
% " (Ai, ■ , A n , Yi, ‘ , y„), and denote by H the ]>urL)| tfiu uiiilfr which 

two points 2 and z' aie equivalent provided they van lie obtained frMii i n * h nthrr 
by a permutation of coonhnnto 1 , m winch only the* vinirdnuiUy Within pnra 
(Ai i Yi) are interchanged, Consider the hypothec of invnnain t‘ inu| ,i i' U 
with reference to the class of all (hstiibwliona und ns nUnrunltvy ihu 1 1 
distributions given by 


(4.7) 


P(Z « /I I => £ C CM P jx) Iffi*, + 0 , 0 , r(i,, l/,)l) d „ (i) 


The 0’s here me any real minibus, „ 15 tlio 2mh power of any imp .lim.iiMorml 
measure ?, and r is any p-mensuinblo function nuch that ( 11 ) ihp mtiArnl 1 ( 7 , 
converges ivhen A is the whole Space, and such Hint (b) rU, y) m riy, a-l 
Clecily in the one-sided case 0 S > 9, "e will again find ff ( 2 )~ i ,,, _ v ,, ^ 
so that the associated test is imiformly most pm.eifill ugmnct thin mm -mini 

“ 1-1 - „ -, 1 „ .. 

generalised denies g.vcn by ’ ° '' Hnf " (,t ' r t)l " 


( 48 ) 


p\z ,a\ = ^ i>t A i , iin)(((lW 


£? '*»*•" ** » « mvminrn 

transformations «d under llic group rouoi-MiM by ,1m 

' ““ .1,1 „■; 



NQN-VVllAMtA me UYl'OTIinSL'J 


41 


tnbuhon equals J Flie teat of //<, against the alternatives that Zy, > Z n 
ih a wimple of a rhniucterntio variable is baaed on 2 z { or | 2 s t \ as P[Z t « 1} is 
restricted In be grentei Lhim ^ or is not so lestnotcd In the fust ease the teat is 
most powerful, ui (he second most stringent, 

6. Hypotheses of invariance for independent variables. To the results ob- 
Luiiii'tl BtJ fur, a different interpretation can be given, which throws some light on 
n-rtnia relate 1 problems Thcoiem 2 gave sufficient conditions for a tost to 
be most, powerful against a simple alternative Ih foi the hypothesis Ih of 
mvariuni e under a partition H However, ff taken in conjunction with section 
1, llie theorem uui be mtcpretcal as giving sufficient conditions for a test to be 
the must powerful tost of structure S(t) with respect to II against Ih That 
us, ilia theorem m really independent of the hypothesis, and depends solely 
on tin* nlternulivo and oil the class of teats admitted into competition, in 
our uuse Lhu dnas of ull tests having atructuie 5(e) with respect to II, The 
wmu* mmuk ol>\ munly also applies to most stringent tests 

Let uh now oouaidcr a special class of partitions, Let 2 stand for the m 
groups of random variables (Z fL , - • , Z u ,) (t *= 1, , m) and let IT denoto tho 

])iirlition mulei which Luo points t and z! am equivalent piovidcd they can bo 
obtained fr<jm cut’ll other by a permutation of comdinalcs which however 
pcnmiUw only (he* tooidmutoa wjthin the m gioups Let a bo tho powoi of a 
one dimensional measure r, and assume that the probability distribution of Z 
is absolutely nmtuuious with lcspect to a and that the Z’a arc independently 
distributed, mi that 

(!) 1) - f II [/„(«.,) dKA,)} 

^4 * J 

Under Him* assumptions consider the hypothesiH H that is mdepeudent of j, 
that h s llint tlie 2'h are ideimeally dmtubuted within each group It easily can 
be rthown that not all admmblo tests of II that havo size e, have striictuio S(e) 
However n generalizalion of a icaull of Feller [10J and ScholTd [5J for the case 
m ^ ] and ft ^ Lcbesguo measure, states that the only teats which arc of size e 
and similar for //, arc tho tests of structure Z(e) with respect to II (11) It 
follows lluil any test which is most, powerful oi most atungent for testing the 
hypothesis //' of iiivuiulucg under II for the class of geneialiscd densities with 
reaped to p, Ima tho same property lolativo to the class of all tests which arc 
similar for lealmg H 

Ah an example*, lake problem lb) of section 3 IIoio a is Lebosguo measure, 
m ih I, and no put 

U } for j “ 1, • ■ , k 

(5 2) £■/ ** , , , , , 

l TV* for j m K -T 1» 1 ' , h 4“ l = s 

ft was shown in section 3 that the test based on ( U — 0 \ , Pitman's teat, is most 
slimgent for testing tho hypothesis that the joint density of the U } a and F's is 



42 


U, L, LL11M4NK i Ml IN 


symmetric in its l -f l aigiiments against the alternator lh;i* tin' \arubli < ate 
independently nounally distiibuLcd with ctinunon untinii' and that 

£J((7|) = f, S(V,) = ij wlieic £ and r| are tiny tlislim t mil mania r« Jl fiillnn■» 
now that the same test is most stringent simihi for toting agnirW llm \iihi< Ham 
of alternatives the hypothesis that Li , * , l\ , 1 \ , , V\ urn mdi'pi'mluuMv 

distributed, all mfch the same probability ctannU Thin tW !i\|>ntlw»u fur 
winch Pitman pioposed hie test, und the result juit rtlntnd is a partial dilution 
of the pioblem recently laised by Wilks [12), to determine (he vIukh mf allrrn iti\ ri 
for which Pitman’s test is satisfactory, 

Ii we modify the example by taking for g instead of Lrlifttgup nn a^uro thr* 
b + 1th power of the measure v* of section l, we arc dealing with rkina UtihLk* 
variables V ], ,Uk,V i, , Vj , We have shown earlier llml if L - l 
the teat based on | u - d | is most stringent for testing the hypot-hans of tompletr 
permutabihty agumafc the alternative that tho t/’s and V'a are snmpltv* from two 
distinct populations of characteustic variables If we add in Urn Uvjtnlhwm 
the assumption of independence of all variables, wo obtain a panurnTnv proldcm 
namely essentially the pioblem of testing equality of probability of m 

two bmomial populations corresponding to tho same number of imlt* It mw 
follows that the test based on | w - JJ ] jg most stringent for this problem As m 
well known, it ia also the uniformly moat poueiful, unbiased similar 

These two examples suffice to illustrate (he lype of result that run UwbUumxl 

should perhaps be mentioned that the equivalence dwell wd at tin* Itr^iiming, 
ot this section, can be utilized also in Uio opposite dmwtiou Tbc fuU, for 
example, that the teat baaed on \ fl - & | ia know n to be uniformly umwl mm r-rfol 
unbiased Bumlar for testing equality of piobabihty of sum*s in two , H mhdi.,nH 
of chnraoteustio Toiiables Iiom flinch (ho V‘s nnd V’s arc (ampin., 
this test ]s innfoimly most poflciful unbiased for testing Ujb li>,»o()i^ii of 
complete symmetry for tho joint generalized density of the ff 8 mu t 1’V 

6 . Erfensum to infinite equivalence classes. The definition of n livjn.tliom 

\T EP11 ° °‘ th ° rostr ' in,on 10 «,nivi,linen 

parametric theory P f 1 P 1080 " 1 Plll>,!r “ ml U "' »UnrlanJ 

detoe a t Z ZtlL^ ™ It' ° nta * pf * - W« 

for each telnets, be a mon/im w* ^ «■’ Men's: Lei ,Tlio annul mint p, awl 

the S, are * * t « « ( «) U. that 

which can be expressed in the form tlV °' ^ l!C U ' C ° f B ' 1 Cs ‘ U 

( 61 ) 



\OVl J \lU\lLTltlC IIYPOTIII ShS 


43 


itiul l<f *i\) Ik 1 Mm dim* of all /> 0 orcurimg m such ldationships For each 
/ E ') h f <7, ltd a spf't ifiml probability unaware ovei Cfi»where (f, is the class of 
l, Midi flint If i i 1 1 €tf Let Z Ik* a laudom viuinblc distributed ovei % 
uuurdiug <n an mtUnuii puilmluhty mcimie F Let ${z) bo that feT/for 
wild Ii z is «S'i, mid lid T » ifi(Z) Let 11$ bo the hypothesis that foi cacli t eT/ 
Lho * nmlilumid dihtubution of Z given Z 4 Ah is Gi, l e that theie exists a piobn- 
InlitN nn'iwurr Qa n\er C a hu h that for nil A «Cf 

W 31 F(A) - I 0,(4 n A,) tlQo(t) 

It ii mn\ Unit wo have o^ontmlly the situation dcscnbcd in section 1, except 
Hint lb pro wo im-miiml further that cadi A T , was finite and foi nil /, (?, assigned 
wiiml prrjbubililus to all pomh of «S ( , 

Wo viy that a lost & of Ih lias sLiuoluro A'(e) if the conditional expectation 
of xpi'A) given 7j «»Sfi satisfiea 

(0 y) Ails'* ( 2 ) 1 ~~ f <p dft| ”5 t for all l, 

Jh, 

Tlu» lemumH tunl theorems hUlLikI below arc Btuught-foiwivrd gcnonihzations 
of iIioh 1 iri *erlioii 1 so tlial no proof will bo given 

IjMima r Any Irsi. r of Jirwlurc S(e) wilh respect lo 11$ is stintlar and oj 
size t for II o, 

Lkmma 2\ If ^ is any lest of Ih 0 / size £ t, there exists a lest <p, of 7/ 0 having 
itrutlurr F(t) and such (hoi 

((l 4) j yi dF > J (pdF 

for till probability measures F, for which the conditional distribution of Z given 
Z t S, ih absolutely continuous until respect lo Gt for all t 
Suppose next (bore is defined anotlier partition of % into sets (£(,] by means 
of a Rjuictt c ll, and let (fl, £l\ mid (3L refer to this second partition Wo shall 
illume Hurt far every (< T, u t'-’U cither A'!, C Ah or A'i 0 Ah is empty Let <?* 
be a HptwiM probability meusuio over Cf u and suppose that for each t e J* 
lieu* e\inlti n probability measure Q, such that for all A ( e CL 

(0 5) 0,(4,) - f 0',(A, fl S'M(u) 

If Hi denote* tho hypothesis that for each u «the conditional distribution of 
Z given Z e iS*' M m Gl , v 0 can slato 

Tnnmni l' For testing Ih against Ih at level of significance t, the totality of 
InUs v> which have Bfruclure fl(t) and/or idudi z, z e implies $(z) = q(z ) form 
an (SHchticiUy complete class of admissible tests 
Let F\ lie a distribution not in Ih , and for each t«f/let Gw be the conditional 
distribution of Z given Z « S t We suppose that for encli t c 3“, G lt is chosen 
to bo a true probability measure, which is possible in most cases of practical 



44 


E, L I/lSIIUVXN A VO C’ ftrn\ 


inteieat (seeDoob [13] for a discussion of tln'i point's. IlicU \h* Jt,no (lv wpm 
alent of tlieoiem 2: 

Theorem 2', Let 

(h 6) <?n(Ai) => f fft d(f t 4* H }{,) 

J/. i~U( 

for all AtCSt) where m accordance u>i(h (he Radon-Nilad j/ifi Theorem (I JJ, g t 
is a non-negative function tnlegrahlc over 8,, and //* C (!, mwmtrr f) nn-d 

does not depend on At Tor testing He against H\ , ft mod/ pmrtrful tc«t of mac t 
is given by <p[z) = tpfz) for z e St where 

1 1 / ft 6 Hi 

1 if 0»Os) > Cl 

Hi if C| 

0 2/ Qi(z) < C, 



where c ( and a, arc so chosen (hat v has structure S(t) 

Theorems 3 and 4 require no modification 

Aa m the oosb of finite equivalence classes the results just tmllmM rah W 

interpieted differently Again the Ihcorema are really iiulejK*mlniil of (lie 

hypotheses, but depend only on the alternatives and on I he cIumi of jidmii W 

into competition This class of tests 1 p is m the present eiw dr fhic<l hv vimditnm 

(6 4), that the conditiona expectation of p ghcii Z f 8, equal* t 1l»t. \\ m w 

just the condition which m the standard npproaeh to (hr problem of uwiirur n 

composite parametric hypothesis for which T ia a sufficient atntialir, lr, muons 

of similar regions is frequently found to bo the necessary and anffirirnt rurolit nm 

or r , to be similar. (See for mamplo |)5)). For these thurnfpm U„, 

ypotheses of the present section represent non-puramelnr (tmihiRUi* to whirll 

" ? tS “ PI J y ' Vtth the ett ™ opt,mum Properties but without tin. „ ,,r„ ir i 
restriction to similar regions 1 

As a. simple illustration of this remark, let Z m (Z,, . , z, Sj Mm | 

T “ § Z ‘ For the COI,<J,t,onal distribution of Z given T « t mkr tl,r> 

wt«° n » V T ! he Sphc ? T = 1 ‘ md he “ ukc lehMgue incMiiri!. Tl..-n tbo 
yp states merely that the joint probability density of H„. '//>< „ 

function only 01 £ Z ,. If wo add to this the assumption of independence of 

the Z% we obtain tho new hypothesis // hW , 

normal distribution with aeio mean Tim ^ C t H ,fC . a mun ^ D ^ rn,,t fi 
expectation ovei each snheio i s v or ^mditumrd 

and the only admissible sLilar tests of H' h'^ ' 0 ,? y ^ "T « 

the » are a sampfe from a normal d.«ut,» ^2^ " ZlS ^ 

(6,8) ii v 

- c 



ViVPUtVMV MIC UVL'OUILSI'H 


45 


is umiormlv nuH pruvei fnl fur II anti unifuimly moat powerful similar far I{, 
If we ilu not riMru l f tu positive values, the test 



i 



Siu<h ill's ie«l, is iinifiuiuly tno^l powerful vmbia&ed and moat Btrmgont for 
tffllmfc //, mnfuriiils mint powerful imbiasod aumlar and most Btrmgont aimilar 
for Khtmg //* 


ULI'MlENCIifl 


Jl] U \ ] jmii ft, Ihityii of l.ipmmnls, Oliver and Uoyd, Edinburgh, 1035 

3 Ni'tmas.K Iy.ahkiev.U '1 AM) St KoioiuiEtitx, "BlatmUcal problems m ftgri- 
ciillural experimentation," Hoy Slot Soc Jour, Suppl, Voh 2 (1035), p 107 
H) l, f J fl Pitmas, "Mgiiifirance tenia which may be applied to HfimpleB from any propor¬ 
tion, Hoy Stnl Soc, Jour, Suppl , Vol 4 (1037), p 110; II* Tho correlation 
eneflicieiiL lent, Woj/ Slal Soc Jour Suppl, Vol 4 (1037), p 225| III The 
ninth ww nf variance tefll, DrowiffrrAo, Vol 20 (1038), p 922 
il| ], l Ehimann ash If Stj>js, "Mtmt pauerful tculB o( composite hypotheses I 
Normal drotribnUnnn, 11 Anna!* of Math Slot, Vol, 10 (1048). 

I'll H S in ‘'tIn n mciwiiro problem nriemg in the theory of non-pariunolnc tests," 
lunnfjj of Math Si at , Vol M (1013), p 227 

|rt) A Waiip, '‘An efmendiilly complete claim of ndituouiblo decision functions," Annals 
f ./ Afrtllt Mat .Vbl ’lS (1017), p MO 

(7) (< lit st hs n t* Mil in, "Most stringent lealfl of fllatialieal hypothcBM,” unpublished, 
(Hi A, WaI 4 j ami J. Womiwm, "An rxnel teat for rniulomneM in tho non-parnmotrio 
nme, hoard (in serial comlRlion," Annofi of Math Ski, Vol 14 (1D43), p, 378, 
[91 M C K Tw hiiii ,'TuncliQUS of a jjUUhIlcaI vamto with given mcana, willupcoial 
reference to Lnplnemn distributions," Cm Phi Soc, Proc , Vol 43 (1047), 


jlfll S\ l'M.ivh, "Note on similar to the sample space," Slat, !?cs. Memoirs, Vol 2 

(lKW),p 117 *, , , 

111) 1, I, LhjuamamiH Scumt, "Completeness, Bimilnr regions nndunbiased oetima 


Awer 


(Inn," uftjmtfiffAcfi 

pnj $ Wii Kd, "Order HlaliaUcs," Ant. Math Soc Bull , Vol 64 (1018), p 0, 

[ 13 1 J L Ilomi, "Asymptotic proportion of Markoff trnnflillan probabditica," Trans 
l/n(A Sor\ Vol 93 (1048), footnote p 300, 

IU1 S Harp, Theory of Jhe/nlenmh blecbctl, 1037. 

1151 J XmiASuuoE H J'e turn, "On l he problem of ilieinoBt efficient tests of sLalistiQai 
liYpolhm'M," Roy Soc LuiMm Phil Tran*, Sor A, Vol 231 (1033),p 280 
[101 A Wai n, On the Pnnapln of 8tnh*UmlInference, Notre Panic Maihomatlcnl EoctarcB, 


Number l 



ESTIMATION OF THE PARAMETERS OF A SINGLE EQUATION IN A 
COMPLETE SYSTEM of STOCHASTIC EQUATIONS'’ 

13 y T. W, Anderson' and Hhimxn Uu»ls‘ 


Columbia University and Institute Jor Admncf t Slmhj 


1 . Summery, A method is given for estimnlmg Hm eoelWnH n\ a flingHi 
equation m a complete system of linear MoelnHir nqimliwiin ■ 

(2,1)), provided that a number of Llio coefficients of lltr w\e*U»\ nre 

known to bo zero, Under the assumption of the knowledge of nil in 

the system and the assumption that the disLnrbimtv* m Ihc equaling 4 th? 
system are normally distributed, point estimates ore derived from l Jir ^n^um 
of the jointly dependent variables on tho predetermined vnrmMt* fTIniirvm l h 
The vector of the estimates of the coefficients of the jointly i^prndciil \ nrmbN 
is the characteristic vector of a matrix involving the mgrr^nm mdlmK-nm 
and the estimate of the covariance matrix of the n&klunls from Uio regrvviaun 
functions The vector corresponding to tho snndle-d rlwwTwi'Vln' r^H fa 
taken An efficient method of computing those ofllimalcn is kiwii in «*'* lion 7 
The asymptotic theory of these estimates is given in ft following puper |2) 

When the predetermined variables can be considered tw fixed, umlidenre 
regions for the coefficients can bo obtained on tho boats of wtmll mmnh* Slioorv 
(Theorem 3) 

A statistical test for the hypothesis of over Idcnliftraltnn of iho mnglr* nqimtlnn 
can be based on the characteristic root nssociftUxJ with Dm utrlnr rif point 
estimates (Theorem 2) or on the expression for the small wimple cmihdonro 
region (Theorem 4) This hypothesis is equivalent to the hyiXiUiroi* Hint lire 
coefficients assumed to be zero actually arc aero, The asympuilic dutlrilmUuti 
of the criterion is shown in a following paper [21 to be UmL of \\ 


2. A complete system of linear difference equations. In man v fields of *unly 
such as economies, biology, and meteorology tho occurrence of vr|ii<*'« uf Urn 
observed quantities can be described in terms of a probability modal which, ns a 
first appioximation, is a set of stochastic equations, Goneuier n (raw) \ c?u>r f/ r 
of quantities which are observed at tune t Suppose that iht^e qnnnUtivH art 4 
jointly dependent on a vector z, of quantities "predetermined" at timr* i (i.a,, 
known without error at time i) Some of tho coordinates of may |m cimrtlinnl^i 


' Thia paper will be included in Cowlos CamraliMion Papurii, Now Etonm, Xn &0 
sioh-Hrl^w th ? ^ at mooliiiKa of the IniUim* «f Minima* 

’ ’ " 151 lm Cli«l.wr) .1,1 in III,,,-., N \ . 

ot IK rou " ihi,on ' ,,r ™" M * ni 

semhtToo^r 11 FcUo "' EcM “ th C“«»'Unt ol tht Conic. Crnmnicin,, for fi«. 



rsinivriux of cxnAULTmia 


47 


uf f/r i i 2, Hr , other mimhimlPi or z, are quantities which arc assumed given 
i oii^smH ilu i ^ 1 nf vet torrt y t (L — 1, 2, , T) me called endogenous The 

part of thi' ’“ft Zi whit h dues imL consist of lagged endogenous vn.nn.bles is called 
r/oyrnous, il.inr sm* treated us ‘Mixed Minutes" For convenience we ahull 
think nf ( as mdnulmg a point nf tune, although it may m many cases indicate 
ihe ordering of u simple in another dimension, oi, indeed, the l may indicate 
simplv a unmlKriiig of the observations (if is entiroly canons) In u 
dynamic eremmme model the endogenous viuiablca arc economic quantities 
mu It us ummiul of mvcMmerU, interest rate, amount of consumption, etc The 
axogeiimm MirmbluR urn llmsc' quantities winch arc cousideied to be determined 
primarily outside the economic system, such as amount of rainfall, amount of 
government expenditures, time, etc 

A simple probability model may be set up on the assumption that these 
quantities approximately satisfy certain linear equations. Specifically the model 
h 

(2 1) B wt/< d- l\ t z[ = t t 

where <i w a (row) vector having a probability distribution with expected value 
zero and and \\ t are matrices, tho foimei being non-singular Primes (0 
indicate lmntqHVH.ilam of vectors and matrices T( Lhevo nro G jointly dependent 
variable^, there nro 0 component equations in (2.1), that is, there are as many 
equftlujim un them arc variables depending on the system. Tho fact that y t 
and 3i do not oatinfy liuoar equations exactly is indicated by setting tho linear 
forms not equal to zero, but equal to random elements, called disturbances, 
\\Y will call llic component equations of (21) Madurai equations, for they 
expire Ilia structure of the system For example, ono equation involving the 
tununnL of goodh consumed, the prices of these goods, tho size of tho national 
income, etc , might tloaeribo the behaviour of the consumors Anothoi equation 
involving intercut rate might relate to tho behaviour of investors 
IL ]m L>ren aliuwn (7J, 111], that m general ono cannot U3C ordinal y regression 
mellitHls In estimate the matrices B w and IV. and the parameters of an assumed 
distribution of the? disturbances Maim and Wald [0], for a special olasa of 
systems, and Koopmans, Rubin, and Lcipmk [11], in a more general case, have 
obtained maximum likelihood estimates of nil of the parameters for tho case of 
the ti Imv mg a normal multivariate dUUibuLion 
Cilice li Vv i« non-singular, wo can rcvuito (2 I) in a diffoient form, called the 


miurrd form, 

(2.2) 

y\ ™ + mi*\ 

or as 


(2 2) 

{/', - n„*; -i- n\ 

whoio 


(2 4) 

Rif* ^ " ^VV i V* 1 

(2.5) 

• Tl-I ' 

Ht * P V u*t 



48 


T 


U ANDEItSON IM) HKI1MVN It! IMS 


If et has a normal distribution, so does . For a Riven l lli.'n. «<■ <un rwulor 
the model as specifying a distribution of y, with conditional kt^I v.ilu* *11,. 

It is clear that we can multiply (2 I) on the left by wiy iion^iiwiler muhjx 
and obtain a system of equations which defines the same ilulnbuliMii nf 1/( 
On the other hand, it has been shown that the only (nuiaformalnm « nf »») 
which preserve the lineality of the system of equations arc mu llijd initial ri nn Urn 
left by non-Bingular matrices If them arc a pi tori restrictions mi 
the set of matnces which result in new coefficient matrix thw 

restrictions is correspondingly decreased. If the vol of ndmi^ilih- rnMrm 
multipliers includes only diagonal matrices the system of fllniclurrtl i*qu-aiuriN 
is said to be identified. In this case only multiplication of all roellmmiiK liv u 
given constant is permitted 

Knowledge of the distribution of y, given Zi is obviously equivalent M himwh 
edge of U vt m (2 3 ) and the distribution of rp. N hen the M'htoin \* iiJmn jfir«|, 
the matrix B w and 

(2,6) P(/» 3 

are determined uniquely except for multiplication on I ho left b\ n d inform! 
matrix, Thus identification of a system is eqimalent to the po^iluhu of 
inferring the structural equations from knowledge of the dHrilnihoii Tin* 
estimation of all coefficients of B w and l' VJ lins been ronsulnrwl in |llf 

3 A single identified equation of a complete system, In ni'Uiv Hindus lUo 
investigator may be interested only in a specific equation of the Melvin, any, 

(3 1) ft-2/1 + 7^1 = f*, 

where f* is a scalar disturbance The investigator may not In* m the 

entire system (21) of which (3,1) is one component ,Since n rnntndt’hiMp 
amount of computation is necessary to estimate all parameters of u lonqiliqo 
system, there arises the problem of estimating only tins rtwlfirionih of n jungle 
equation It is desirable to do this with the least possible re slue lu r* fi^uiiipiirinH 
about the part of the system which is not the selected structural equation In 
order to treat the selected equation afc all, we require that it is identifies I, (lint is, 
that there are certain restrictions on (ft ,, 7 ,) such that no linear combination of 
rows of {B vv V vt ) satisfies these restnctions other than a cunntimt, time* t 7 ,| 
It is not necessary to assume that every component equal ion m jdenWivd, tlmi h. 
that the entire system is identified 

V?q shall suppose that the restnctions imposed are llml mlnin vwHinwiU 

are zero, We can auange the components of llio vectors ho that (he nMnclnm* 
mg 

(3 2) 

where 

( 33 ) 


(ft, 7.) - (fi , 0, y, 0), 
0 => (fl', • • , f) 



ESTIMATION OF PUUMUmiS 


49 


lms // coefficients not assumed to be zero and 

(3 0 7 ■* (r 1 # ■ ■ ■ t 7 r ) 

him /•' rwfiirienlrt not assumed to be zero 
It will be rumemonfc to divide the 0 components of yt into two groups (in 
number 7/ and G — H, nespeelively), and the IC components of z, into two groups 
(m number F and D respectively) according to wholhor or not the components 
enter intu (3 1) with coefficients not assumed to be zero, X-ct 


(3 5) )h ** fa , r,) t 

(3 6) *i tei , v,) } 

where 

(3 7) - fan , ‘ i «n/)i 

(3 8} ft m tea , * J * in.g-Ll), 

(3 9) Mi « ten , ■ 1 , i^r), 

(3 10) Vt » ten , - , I'm). 

Then the w I or led equation is 

(3 11) /5.r! -f yu, « ft . 

New let ua see how the identification is accomplished. Partitioning n uf mto 
// and U - It rows and F and D columns oa 



vc can w rito the reduced form (2.3) us 

(342) “ nI M ui + n».ut + 1 

(3 i3) r', ITruW| + t/ i 

where 

in m (5i r £0' 

Multiplying the above equation with (0, 0) wo obtain 
(3 14) 0x I 123 flldJuU* + ddjt^i + * 

Hmco this must be identical to (3 11) wo must luwe 
(315) y-HTO?., 

(3 0 . 

The mnlnces irf. and It. are defined by the dubnfautioB of « given l *j*. 
(for ot least K «■ V -f F linearly independent values of Ui, vo lhe equation 



50 


T TV ANJDCI130N AND HV.UMVS IH’lllS 


(3 11) is identified if and only if the solution of (3 15) and (3 lb) tor d a m«l y ^ 
unique except for a constant of pioportionalily Tins depone!* nil tin* lank «»f 
n aP being H - 1 Thus a necessary and jmllii'ient condition tlml (111) 
identified is that the lank of x, on v t be // - I In paUicular thin nnplie i Hint 
the number of cooidinates of i>i (the number of zero eocllicionpi m 71 ) 1 m* nt 
II — 1 It can easily be shown that tins condition is equivalent In riM|uirniK 
that the rank of the mafcux obtained by selecting the 0 — II columns of ]l n 
and the D columns of r M conespontlmg to the cocITioipiUh fm$tiiiu*l w*ro m t|w 
selected equation is <7 — I, This is the condition given by Konpriirihn mid 
Rubin [111. Other homoganoua linear restrictions can bo put m tins furm. 

If the vector e t is normally distributed with moon zero the vector tit is normally 
distributed with mean zero, Let the covariance matrix of S, Iwi ll Jg limn 
the variance of tt => fltii is 

(317) S 

The constant of proportionality in 0 may bo determined by ectlmg Iho yaHiuic# 
of , <f , = 1 , another normalization is 


(316) jS f — I ( 

where p' is the ith coordinate of 0 In general the normalItalian can Ik? u. niton ah 

( 3 . 18 ) fa'jf i t 


where can be either a known constant or can be a known function of unknown 

parameters 

As an estimation procedure lor 8 and y and J Q 11 - M A. Otwrliirk 

suggested in an unpublished noto that one Bolve aquations (3 15) niiri (3 Ui) 
mt i (n,„, n„) replaced by (P? U P„), the sample regression of x on u mid 
By these means Girsohiok found confklenco regions (sec acqlitm K) for llm 

o m tV ° emt ' 0n syat ° m A a,mihr ldca lie “ W " ,ld 11 ,ni ' ,h '" 1 of 


The present paper develops n method for handling the case of 1) '■ II m 
this ease the rank of P,„ u, usually II, thus giving no ndmwuble estimate of d 
The proposed method follows the approach used m discriminant problem*. 

In n second paper [ 2 ] the present authors shall give asymptotic properties of 
these estimates that give a certain justification for Iho use of them 1 „,|„ r 
eiy general assumptions concerning tho a, and the i, wo prove Hint I here 
estimates are consistent. These hypotheses poimit the mveaLlor t„ J1Z 

distributed, or even that they hjve identical luibufilnl '^ " on "* ,,V 


of the ent“ 



*VIIM\T 10 N 01 1 'AI(\ML,TLI |<1 


51 


h\ ltn l malm (I 1 , M II 4 ,) of regression coofTicienls of x, on u t and v t The 
mlordrpimdmii'p of (lie comdinalcs of it, indicated by the selected equation 
nullifies lho dependence on ii , lliat is, 

(d H $Hn - 0, 

Su|i]Kiw \\P \mhIi til cfAtvinate 8 and 7 from a sample of T observations: 

(/i cj), (/i, *2)1 (■*> 1 1 ho information wc need can bo summarized 

in ilic soisijid order moment matrices 



fsinre one coordinate of to may bo unity there 1a no advantage in taking these 
momenta about the mean We shall find it more convenient to use instead of 
ft l)u* part of i’r that m orthogonnl to m , that is, we shall use 

(4.6) si «" v\ - M ril MHWt . 

The momenta are then M u t fif* u t il ( V u j 

(4 0) AT* ■= M t * — M, V M^[M UV , 

and 

(1.7) Al it 03 Af|rtf AltiuAl huA/uV • 

Wo ran e\pre«a the 1 educed form as 

(■i,8) x', - n„«! + n„«; + s ',, 


where 

0 0) 


ii.» » 11- + n„ii/, u fl/;L, 
«„ - n„. 


An eatiirmifl of IT* la the regression of $ on j, 

(4 JO) Pu ° - 

To estimate 8 wo tttko Lho 8 that makes 8 Px 1 smallest in the metric determined 
by lho moment mntm of the icmduftlfl 

(4.11) UTit ~ A/jx — PxtAIuPa — P*u 



52 


T, W, ANDEfllON ANI) llLlttUN JlUlll'* 


r-1 

UU r 


where 

^ Ptu ^ A/jw 1 ^ 1 

Th 1S is the natural gwmalisallon of but squats, «"■ ™K«i< w n, 

to the component with least variance This eaUmaU* n IW '<w|.,r Mlntym* 

(4 13 ) ~ vWwmW m 0 

which is associated with the Boiolle'it root of 


(414) | P n lTj>L - v\V„ J -* o. 

This is normalized and the estimate of 7 i& > 

In section 5 we derive these estimates by the inetliwl of maximum likelihood 
under certain assumptions. Although it is assumed that the (Iwlurlmnr^ are 
normally distributed for this derivation, the estimates can 1w‘ mil m more 
general situations This theory is in ono souse a special enw of Ute thenry uf 
estimating a matrix of means of a given dimensional>ty wliirh m tin uxtemuon 
qf the discriminant function theory [5]. Tor an application of lliw itiHIuk! of 
estimation see [6] 


5 . Derivation of maximum likelihood estimates. We derive Ihe (ydirrmtea of 
0 , 7 , and <r 2 under the following assumptions. 

Assumption A. The sheeted eiructurat equation, 

(3 11) 0*\ + V L \ 80 ft 

a one equation of a conipfefc Hu ear system. of Q fltoc/ifwfw rv/tiations. Thz fyufl/nm 
is identified by the fad that if 11i$ the number of coordinated i?i Xi /Acre ore ni irmt 
B - 1 coordinates m v t , the vector of predetermined variables rial m (3 11) hut 
m the system 

Assumption B At time t dll of the coordinates of Zi ** (u,, ty) or? gum, 
Assumption C The coordinates of z t are given functions of exogenous varudikx 
and of coordinates of y t „ j, y<„ 2 , * If coordinates of r/o, j/- 1 , • * involved 

vfi h, they untl be considered as given numbers The moment motrn ,U ( » u non- 
singular wifi probability one 

Assumption D Tke disturbance ueefors fi, arc (Itftfnfrufrd anally wdrjmdtnily 
and normally mth mean zero and covariance matrix O ff . 

Vte shall consider normalizations f3,l0) where 'Pm may bu a function uf oUii ( f 
parameters, but 

( 51 ) iW^/clj 3 B * 0 , 

We can state the results m. a theorem* 

Theorem 1 Under assumptions A, B, C\ ami D ftc maximum hhchhaod 
estimate of p is 

(5,2) 


P = l>/Vb$„b‘, 



uhrrr h m tho tnhitton of 
ri i:h 


INTIMATION OF FAIKMKTLIW 


53 


(I'tMnP'.t - idP„)5' *= 0 

rormptmduw to the maUrai tohic oj p and I\, is defined by (4 10), M„ by (10), 

aiul It t , by 0 II) .hi nthmalr of y based on the maximum likelihood estimate 
ft™ ifl fifiim by 

(5,3) -f « '-$Ptu , 

where V t *\* piwn by (i 12), The estimate of a is 
(5 i) $ - (1 + F)/5^>„5 , 

i/ 

(5 6) blVb 1 ®* 1 

We Apply the method of maximum likelihood to 

(fid) !• ■« (2 T )" lr ’|n7, , | |r exp/-ii: (Cl - - n„zi) 

1 i-L 

muter the rcaLrictioiiH (1.1) und (3.10) Replacing u ( by a* and addmg (41) 
and (3.10) multiplied by grange mullipliora X (a vector of D coordinates) and 
$ rftf)>cclively to (lie logarithm of L wo obtain after division by T 

A IL 1 log 2?r -h $ log I ft7< I + 0H«\' + - 1) 

(S 71 - f T, (*< - - n„u' t - n,/,) 

*1 1-0 

Differential mg (5.7) with respect to fl, we obtain 

MR) “'i - H„X' + 2**.,/3' 

up 

Setting lliifl equal to zero and multiplying by p, wo have 

0n„X' + 2 tftojr « 0. 

By virtue of (11) and (3 19), the Lagrange multiplier 4> must be zero Hence, 
aa fur a« the dorivanvea or (5,7) aro concerned the restriction (3 19) does not 
enter. The Helling of I ho tlcnvalivoB of (6 7) equal to zero and (4 1) will define 
$ except for a constant of proportionality winch is finally determined by (3 19) 
For cmneniouea m deriving the estimates we rIiaII use the normalization 

(5,9) |?U' ** 1. 

The derivatives of (6.7) with respect to the coordinates of , II™ , K ZI , and 
p mo wt ecpml to zeio, resulting m 

ft., - At,, - m.a 1 ,. - m„ n,'„ - a,M„ 


(S 10) 



64 


T TV.* ANDKTI80N AND lit BMW 111 lll\ 


(5 u) ir„ - + /H - n, 

(5,12) &(M„ - - 0, 

(B 13) fl„V = 0 

Solving (512) foi Hj U , we obtain 
(5 14) fU. - P~ 

defined by (4.12) Solving (3,11) lor fl„, wo oblam 
(515) fi„ = P„ + 

Multiplying (5 15) by $ and solving for Jv, wo olilnm 
(5.15) i ■= . 

Substitution into (5.15) gives 

(517) - (f - , 

In view o£ (5.14) and (5,17) we eon write (5,10) as 

(5 18 ) ft.. - W„ + . 

Let 

( 6 19 > - p, 

Then multiplication of (5,18) on the right by J)' with lute of (3 0) gi\ m 

ftj' =■ Wj' 4- 

- iM' + fijr, 

that 13 , 

( 5,2 °) <U' = r i- 

l — fi 

Equation (5 13) can be written aa 

< 5 21) P»M„P'J' - p!)J ' m o 

by substitution from (6 10), ( 5 17) and (5.10), Combining (5,20) ami (5 
obtain 

22 '’ - vW„)fi' - o, 

wheie 

(6 23 > * - p/a ~ „), 

For (6 22) to have a solution, , must bo a loot of 

(4 W) I - i-H',, | „ 0i 

Substituting from (5,20) into (5 18) wo obtain 

(6.24) (L = + p / ——V iy tufnw ,,, 

V 1 - n) = W* + p(i + tiWupfav 


wo 



* Ml MAHON OP PAIUMLTEJtS 


55 


' 1(1 (h*l€*rimri(* w\mh rout of (111) to use we shall compute the value of the 
likelihood function n In ii I he st* estimates mo used, It will be convenient Lo use 
tlic holntimi b of fl 1*0 mlh normalization (5 5) Thus b is proportional to $\ 

fM.J' » pi- $W,J' 

from (fl 201, m* mh* that 

$ » 6%/i - fx « 6/Vl : FT, 


Lot the oilier solutions of (d 13) be bj , ■ • • , b u t with, corresponding loots 
n , ' ,vu i oml 



■If* pa 



Since 


(5 25) 

1 A„ | - | JK„ + v\V,i>W„ |, 

we Imvo 


(fi 20) 

1 11 * II A.. II /<*' | - 1 1 + vB'W„b'bW„B-' 

Since 

b]V, t li*> - (1,0, ... ,0), 

anti since 



v»e deduce from (5.26) 

I | = I IK« | (1 + v) 

Multiplying (5.10) by , taking the trace, and substituting m (5.6) wo obtain 

(fl 27) L - (2xer' Tn | W XJ | - |r (i + v)~ iT . 

T)uh i« a maximum if v Ib llic amallest root of (d, 14) 

The theorem now results The oxprcssion for a* follows from 

? . fad* m WU'/WU>'. 

IT is a known constant matrix, i« = d?«, if is a function of the param¬ 
eters, is the same function of the estimates* 





T, W ANDEMON APT!) IlhUMW 


If we define 
(5 28) 




* 

*M I 


^ 1 = — $(ftxn — un)- 


we toe by (4 9) 

(5 29) 

Since $ annihilates ft*., (5 3) results 

The estimate of U„ is given by (5.17) And Ilia cHimnlc nf l>„ n 


(5 30) 


& r => Wn + rfuwr... 


fl. The Ukelihood ratio test of restrictions. It to 
selected structural equation is identified by impoaniR the «*»*«*• u '^ " r ™' 
coefficients ate aero It was noted in Section 3 Hull «t l.'tisl ( - 1 meU n-anc- 
tiona are necessary If D, the number of rcclrictmns on lW pw-ldfrminttd 
variables, is more toff - 1, we can tost tlio hypothesis Hint Unw » fwffinfnh 
are zero against the alternative that only a smaller number am wm niuu 
eamvalent to a test that tt„ is of rank It - t against Hie allnmnlivi* Hint. I'm 


rank is H 

It can be aeen mlmtively that Ibo smallest root v of (1 11) uulicnlm Im* war 
P« is to being singular, This statistic cun be uwd Lo U^t I In* 1»> put ton tlml 
n =tf la of rank H ~ X Tlio tost ia similar to the test uf rank fVUgg<*HH<d bv P X* 
Hsu (8] The teat is stated precisely in lho followup theorem 
Theorem 2 Under ossiwiptions /l, B } C, find V the Uhtlthootl ffllw rnkrioft 
for testing the hypothesis that n fb is of Tank U l dflflmiff the dllmiolnr Ihnl U 


of rml H is 


(01) 


(i +») lr , 


where v ?s the smallest root of (4 14). 

Proof If there is no restitution on H™ , the mnMinmu likelihood Orttimulh 1 of 
H,i is P „, of is Fxu, and of is 1F„ Then Lho likelihood fimetnm J« 

(6 2) (2vcr |r " | IK„ | - |r 


The ratio between this and the likelihood fuuclkm (5/27) umisutuimhI under the 
hypothesis that the lank of II™ is II — 1 is (0.1). 

It is proved in the paper following the present ona that under certain condition# 
(more general than those of Theorem 2) 

(0.3) -2 log [(1 + r)~‘ r | to T log (1 + v) 

is distributed asymptotically as % with D ~ U -l- l degree# of freedom, Thus 
an approximate test of significance is given by comparing (0,3) with a &igmlu‘iuu r o 
point of the x -distribution with degrees of freedom equal to the execs*# number of 
coefficients required to be zero (] e, the number beyond the minnmnn rf^uiwKl 
for identification) 



I MlMVrHlN OK I’UUMU’] US 


57 


7> Computational procedure* The estinaulum protedvne m sections i and 5 
(low not iMilieut*' llu* must eflicioiit method foi computing tlio.se estimates The 
procedure given lien* m behoved to he efficient for ordiwuy computational equip- 
rnrnt und eon nmI\ lie adapted foi sequel ice-coil t lolled computing machines 
Let oh nee vDml ovpio^sjuim ocem m the csl minium piocethue foi 0 anil y 
We find Unit we imH luM know , \V IS) ami /L u , these will suffice 

if 'Pi* is tmmlant or tin lo estimate 0, % and a 2 In what fallows, w r e shall 
ftRHOiiie the iiurniahmlum jh 0 [ « 1, aa the lesulta for other noimahnationa 
follow nniinsl lately. I'Anmmmg the estimation equations, we see that we may 
use any mallura proportional to Die moment malutes If equation (3.11) 
)ms n rnnslwd term, it is holler to mm moments uhout the mean and estimate 
the constant term hy netting the calculated mean of the disturbances equal to 
yawn Ohm possible method of canceling foi the menu is to calculate 

(7 1) wj* “ T T, V' ft - (S PiYE ffi) 

i*i \i«i /\i«-1 / 

The estimation pioctslvm* foi d, a 1 , and the lcmnnidci of y is not aQccled by 
correcting for the moan The computational pioecduro indicated lieio is 
iiNohaiig<sl except for a factor of piopoilionahty m the equation for a 1 if a 
(lifTrrcnl form of correction for the mean us used. 

7 I ('ukuhiion pj Mi,M 7. l Af« and 1J\, Tt is known that 

(7.2) U r *, - fl/„ - 

Wn Hindi u ho (7.2) h) compute W It We blmll compute by Die 

rnoDiod given by I)wyor [1] Let iih denoto Die element m the ith low and 
jlli column of jV„ by ftq, and the element m the tih row and jth column of 
M, t by (j,j . I .el m coneluiet Dm following imay 

C|iCn ' Cix Co ■ ' Cm 
dudn 1 dis/u/u fm 

Ca ' Ga/c Cu Cn 1 ‘ Caw 
dn djx h\ jn * Jm 

4 » I 

C/CJT Ca:iCjT 2 Cltll 

di(K fKlfK2 full 


k<( 

c, } *=» bn — (hiCu, 


du 


Cl/ 

c» 1 



l < i < j < K 
l < t < k, i < j < II 

l <i<j<K, 

1 < * < K> 1 < J < H 


where 



58 


T TV ANDE11R0N AND IlhHMVN 1UIIHN 


Then the element m the ith low and jtli column of Dir* sjinmrlru mnln\ 
is 

K 

S &hfki * 

A-1 


If we wish to estimate several equations in the system bj linn iwHichI, Dim 
step need only be done once, os Mj,M7. l il/« and IF,, du not depend iifmu Die 
equation (except that x would be enlarged) 

7,2, Computation of P zu We shall compute P, u by the ftbbrovmLed ] hwhftlo 
method Let us now denote the element m the ith row mill jtli column of 
U w by a,j , of M Vi by Then let us perform flic previous operations, not 
including the last step Wo may arrange the work, if only one equation is to 
be estimated, so that this is already done Then define 

(Ji) = fj df kQkl / ^ ^ 1 ^ ^ ™ 


Then the element in the ?fcli row and jth column of P n is ff (l 

7 3 Computation of P t ,M n P*i, We know that 

(7 3) PkM.^PL = - M M jr u Ur * t . 

Let us compute , using (7,3), Wo must first calculate , 

We may do this either by the method of section 7 1, or aa P t „Af Mr . 

7 4 Compilation, of V) $, and Wo shall use 

( 53 ) V=-f>P.. 

to compute y after hfis$ been computed, 


Case 1) H = I, In this case the vector j3 = (1), v ** PuM tt P f , y JW , 

Case 2) H = 2, D > 1. Let a {/ denote the element m Die ith row jijuI jtU 

column of PjsM at P x ,, u><, the element in the ith row and ith column of IF 
Define ** * 


Then 


h - | p tt M §t p'„ I, 
h - | W xx | 

h => Kauuia + a w uin - 2ai a in u ), 

v » h. n ~ jejd 


Let 0 PxtMuP„ -«■ vWa , Then 


P - h 


4 



On 

On 




UillMl'UON l>t' l'MUMWr.HS 


59 


C'i«.f »> // 2, H ~ I In Him raw m = 0 Then 6 = P„M„P'„ , mul $ 

mn> bo (miippled as b< fore 

C mo I) H > 2, f) > II - 1, Ihmig the promlmo of section 72 , compute 


1 -- (/’iJ/ni 1 Jjj 'll tt4 Lot hh multiply equulion (5 22) by — i 
ami w*t l/v day] A Wo obtain 


(7 1) (A — A/)$' ™ 0 t 

ulu'io A jh llio lai-g^L clmractorjHlic root of ,1 Then wo may employ fclio 
mctlnnl of -Yukon [l] to o^limnlo A and $ Lot 50 bo an approximation Lo $ 
lhe 1 olumn of -1 \Mlb largest absolute values is general^ a satisfactory 
approximation Define 

J 1 ' 

<U *= , 


Ibr quantities A,/ approach A as t incicases, and the nounahzed voctois g,, 
appruaHi jl The cum ingenue may be nccolciated by the methods given by 
Ailkcn Tim mirmah&nlion hIhhiUL not bo earned out until tbo A,, me sufficiontly 
close for differi'iiL j 

(’imi , 5 ) It > 2 , «■ // - 1 Lei ua go thunigb the proccduio of section 7 2 

with J , and with no malux B Then enu = 0 Set g lt = 1 , 

mid compute 

(?*“-£ d.i (fk I 

Then 





V 


0. 


7 5. Computation oj 6* Wo have 


(7f>) 


? = = (1 + p) 6 wj*. 


If we use the w* f s instead of the m’a, we must divide by T 2 , and if otlioi factors 
of prujiai UonnliLy aro used, we must divide by them a is in geneial bmsed, 
but the bias depends upon the nature of tbo complete system, and is not easy to 
calculate. The bius ib of tbo older of ljT 


8. Confidence regions based on small sample theory. 4 If all of the pre¬ 
determined variables in tho system aro exogenous (1 0, "fixed’'), we can obtain 
eonhdoneo regions for tbo coefficients of Dno equation on the basis of small sample 
theory, To do this wo require only that tho disturbance of tho selected equation 
bo normally distributed; that is, the linear fouu in the observations fat -\- ywt 

1 Wo aro indebted to ProfcBeor A, Wald for ftBsialimca in simplifying our approach to this 
problem 



flO 


T W A-NDEHSON IND HEW MAN KUJttf 


is noima.ll> distributed with moan Kero and \ finance a Tim regri nl tins 
on fixed variates is normally distributed and coilnin (jimdr/Uie forma m l)ipn r 
linear forms have x 3 -djstributiona On tho baam of tins w e can Mil up i onlulriirc 
regions for the coefficients 

In addition to assumptions A and II we use the following 
Assumption E. All of the coordinates of z t ** (n r c,} tin emgrtwnn The 
momeai malm I„ is non-singular. The disturbances of Hip s, I, rial rqunhvn are 
distributed independently and normally iwl/i menu 0 and carumce v 
Suppose we have a get of observations , Ui,Ui). (xr> , u r ) If 
ive know /? and y we can obtain T values of 

01 ) «/, = fix', + yul, ( » |, , 3 ’. 

The sample regression coefficients ot w, on «, nml n, arc 


( 81 ) 

(83) 


1 T 

^ yT Xj WtUt M~uu 0M jnj htu [, 4" "Yi 

1 T 

6 ~ m Z w, s, il/7, 1 => pm "to 7/. 

Jl twl 


The two vectors c and o arc distributed independently and 
and covariance matrices 


normally with mean 0 


^ ^ &(c J c) = r 

(s5) 

Hence (by usual regression theory) 

(S 8> C = i *X.„c' = i (HA/.. 3 CUW + W,„y -I- yMa' + yA/„„ r ), 
(S. 7 ) £ = I^.c' „ 

" °' m " ~ - UM& 

(ee) a ~ ° - e ) 


;i (WW, 


are djst 2 ibu ted independently asxSvj 111 Z'M) n,kJ V 1 t- j , 

TSJ 1 ; ;r: of * d,W8 of frw,lom * 

regions, C,e cm,3,derlll| on3 we enti obtain the tfcairod couliilcnca 

jTt_ 1 


«s 
(8 9 ) 




^ - I 



rsmisrinv. of ruuMbTnm 


01 


u'/.. re >!'„ ii ii (/urn »m(nx, («) a confidence region Jor 0 oj confidence e consists of 
all 0* sah •fifing (H S)) and 


(S JO) 




r - k 

- jj— S I'd.t-kM, 


uhrrr I\> T *(t) M rhnmi so Ihr probability of (8,10) far 0* = 0 is * (b) A 
rtmjidrnrr region for ii and y simultaneously consists of ail 0* and y* satisfying 
(X !)) and 


Uu.Un.li*' + + y*iUy* + 

' >11 „ 0 *' 

(Rll) 

rp __ js 


(c) If the nunrmhration is a 7 *= L, then a confidence region for 0 of confidence 
fi < 3 consists of all 0* satisfying 


(» 12) < xlUi), 

(8 13) \ , r-r (*i) < fJ*W,d3 + ' < x z r-jt (*), 

idnrrr A,?i(fi) is chosen so that Ihr probability of (8 12 ) is when 0* = 0 and 
and x 5 r *(<i) are chosen so that Ike piobability of (8,13) is « 2 when 0* — 0 and 


(H U) 


X a (^) < l < X*(*.) 


(rf) A con/iduirc rrpioit for 0 an y suiiiiMaiicoiisfj/ consists of a?t |9* and 7 * satisfying 
(8 13) and 

<8 IS) 0\\l^MZUr^' - 1 - /}*iir„T*' + y*MuJ}*' + 

+ < £ («,) 

Region (c) 5s the interior of nn ellipsoid and an ellipsoidal shell in the j3*-apace, 
region (d) is similar in the /3*, 7 *-spacG Region (a) consists of the intersection 
of the quadric surface (8,9) id the interior of a cone in the 0 *-spnce, region (b) 
is similar in Llio 0*, 7 *-spncc 

It ih clear that there are many other wavs of constructing confidence regions 
by taking regression 011 ullnu h\cd variates Of these the best seem Lo bo those 
of theorem 3 . Il has been proved 12 ] that the legions ol theoicm3 are consistent 
in the sense that for flufFioici’Lly Inigo T the piobability is nibitiaiily noai 1 that 
all of the confidonco region is \\ ilhm a certain distance of 0 or 0, 7 For an 
application of this technique lo conomie data see a paper by Baitlett [3] who 
suggested this method independently 


9. An approximate small sample test of restrictions. When 0* = 0, the 
proUb lity of ( 8 , 10 ) is e. If 0 * is replaced by $ which minimizes the expiession 



02 T w. ANDERSON AND MvNMAN 1113 HI S' 

on the left, the piobability is at least as great; it is, any, 1 - 5. 
the smallest root of 


(9.1) 


Ain il /11 M a 


T 

x , __ 

A T - K 



0, 


TIhh ratio i i ! \ t 


Sance 
(9 2) 


A 


T-K 

TD 


rvheie v is the smallest roo t of (4 14), tho probability of 

TD 

(9 3) v > ipZTJt ^o,t-M 

is 5 <. (1 — t), We summarize this as follows. 

Theorem 4 Under assumptions A> B, and E, (lie (0 3 ), \rhtrr v ib 

t/ie smallest root of (4 14), constitutes a lest of Ihc hijpothma that thr twjfinrnfa of 
Vi in the selected structural equation are zero of significance less than 1 -* c 

This test is simply an approximation to the tent given in km*U nit 0 Thu 
exact probability, 6, of (9 3) is unknown; in fact the distribution of v depend** on 
n„ and the distribution of $i Ilowcvei, aince fi lies between 0 iunl 1 i, ue 
know tlmb if the test ja used as though the loved wore 1 - <, the tm mil be 
"conservative " 

Another approximate test af the restrictions can be ubtmucd from tin* in¬ 
equality (8 IX). If the hypothesis is rejected on Ihc bams of one of I low 4 
the corresponding confidence legion (for 0 or fur 0 ami 7) in imngiimij, fnr nil 
0 or 0 and 7 me excluded It should bo noticed that the* iwo of n given rutin 
to test the hypotheais at significance level $(<1 — t) doe# not uflrK (Jie ru/]- 
fidence coefficient e of the confidence icgion when the hypothec ii lrue 


JLUfiU lHUBiMjiMS 


fii A. C ArTEEN, "Studrea in practical mathematics If The cvliIusiIimh of llm latent 
roots and latent vertors of a matrix/ 1 Eduib Hath .S'ur &? U'UA T) 

pp 269-305 

[2] T. W Anderson and Herman IUiijin, ' ( Thc asymptotic prujimim uf vuLjiiiaU 11 * rtf ihr 
parameters of a single equation m a complete stein of ulnrlutNLii equal inwi, ,J 
to be published 

|3J M S Bartiett, "A noto on tho atatiBtica! ostminlum of ih'imnid niir| *m ( p!.y f i Inlnw 
from time Bmw," ^conoiHCfnca, Vol 10 (101H), pp 333 3ft) 

IS l ,rJ f ( ^ Ufttlon o[ fonim," Psychowtrikn, Vu\, 0 (liMl), |<p 365 

[5) ft, A. Fisiieu, Tho Btatistiofll utilization of lmilliplo inonuur^JiicuilH *' dnmri* ut 

Bwffentcs, Vol 8 (1038), pp 370^380 

m M A GlIto « Ia > 1 and t IIaavelmo, ,f smtifliicn] rtn«lyat B 0 f if ltI rJ^ninnd for fowl 
vTw”m 7) 7”llO fl CBtimal,Q11 °* elr,lcU,Tftl cnimt.ona/' Ihorwmrtnrn, 

l? ' T * 6ySt ' n ’ 0t •“ 



tilIMVTION' OF PAlUMM'UtS 


G3 


l» I, IlMf, <r On ilm jimlilt'ui of rank aud tlio limiUug distribution of Iwhcr’B Lrst 
fuiirium," Annah uf Fu. flfiuw, Vol H (LOU), |>p 30-11 
l‘j| U H Ma\N a\u A \\ a 111 , "Ou Urn aULtatwal Uenlmotil of [uicO-r Btaclmstic ditTercnco 
p^u'ilums,’ 1 i'VflUfiim'frira, \ ol LL (1013), jjji 173-220 
lfl| Oi \\ Uum H 0i., "Coiilluviico tuuil unit by meant! of Ihr momenta and other mctUadti of 
cmillueucc autilyRia," /,<*imnmdricrt, Vol fl (1011), pp 1-21 
[Hi iKlnliadraf /ii/rr^ucr in /Jyiuirmr’ Kroneuuc ityafems, to lie published Q3 Cowles Com 
jnHriuui MxmO graph No 10 



SOME SIGNIFICANCE TESTS FOR THE MEDIAN WHICH ARE 
VALID UNDER VERY GENERAL CONDITIONS' 


By John ft, Walsh 
The Hand Cat porahon 


1. Summary. Order statistics nro uncd Lo derive Kiffinticunce lru*f ri>r Uni 
population median which arc valid under very genera) miulUioiiN "J tu-d* 
are approximately as powerful as the Student I lest for anmll fimn « 

normal population, Also the application of a t«t require* very lillle i i>iii|»n|ji. 
tion Thus the tests derived compare very favoialdy with llic Med fur ^ni.<U 
sets of obseivatioDS Applications of these ord >r statistic (eatfl In < erlniw \wl| 
known statistical problems me given in another paper |1) 


PART I, RESULTS AND DEFINITIONS 


2, Introduction Considei n independent observations drawn from n popu¬ 
lations satisfying the conditions (A): 

1) Each population is continuous (i,o. its cdf ib rontuiiuniB) 

2) Each population is symmetrical, 

3) The median of each population has the same \uluo $ flf I In' pmnt 
of a continuous symmetrical population is not unique, the median £ of llm iw/pn* 
Jation is defined to be the midpoint of the segment nf uOSV vjiJuph ) 

It is to be emphasized that no two of the obsorviUiona tiro urrV'WinJv riunwi 
from the same population. Significance tests arc derived lo roinjutif $ milt a 
given constant value fa . 

A general method of obtaining one sided and BymJimlund IrhU is gn i‘U in jwt 
tion 8 This general method furnishes testa which Imvo rngiufii awe lev ol* »f flic 
form r/2" t (r = 1, ■ , 2" - I) Each value or r cun bn ntinnjid fur wiiif one* 

sided test Unfcn tunately testa obtained by the general ntcllmd are wty difficult 


to apply from a computational viewpoint If n > 10, the number of compuli^ 
tione required for the application of a test is prohibitive 
To overcome the computational difficulty involved in iiiing (lip general method, 
easily applied tests using order statistics are derived Thcg* tc«(i nro buned - i 
ordei statistics of certain combinations of order statistics of tin* n oIimtVuM 
each combination being either a single order statistic of tlu* h utjten &Uqnrt or 


one-half the sum of two order statistics. The tests are hiv urnint under pcrnuilw- 
tion of the n observations and have significance levein nf the form r/2\ 
(r ^ 1, 1 2 — I), Tabic I contains a list pf some one-mdcd mul avimnctrii hi 

testa for n ^ 15 (#i, * • , x r represent the n observations ill ranged in uicreaMn^ 
order of magnitude) Additional significance tests can lie obtained bv u($e of 
Theoiem 4 oF section 6, 


JJ*lT li n prc r nte f d ll n Jfj." p T^ ern oblnincci 1,1 rim coureo o( r^cntdi conduced 

the wi £ h n,s re,wch wu p "'"" 1 " 1 * h,h 


64 



iNt i foil mi hi nr in (;,) 

II ,i ‘\mmHui ,il pujmkhuH han a mean, the mean has the same value us the 
miriinii Unii r l * nil populnlmn from ninth ilil uhscmiliun is drawn satisfies 
iIh iddihonil Miiididon dial its menu mush, Uni median tests dmvcrl m Lius 
fi.t ]k r arc . 1 W» led i f»f I hr inrm 

Vhhmudi it i** imliki K I hut f Pin litmus (A) mr i*t oxiully satiated in pme- 
h< c, fin e FoinliMuim appeal lc» be uppioximnloly saLjsficrl m many piuclirul 
MhiulmiH Mm cm ori midilimis(A) an*of *auh l simple form that uppioxnnulc 
m iilii .Hinti t m lif'MUf nllv he olFlamed without an exteumve investigation 
('i iimu ol (In- nidi r Mu1i*Uu UmIs an* xmy olhnmt if the n observations me n 
simple fimu a noMiril population Ffheiencirs mo listed foi sumo of the tests in 
Tahlo l Tin h‘ (fiN art* approximately uh efficient as bin* Student (-Leal (The 
ellu ii*m \ <*f a N“-i, mine precisely the pmm ullicipiiuy, is defined in section 3 ) 
Lhe older Midedu (e*ds au* lumpcliUve with U\e Student (-test In choosing 
In Iw i*oii dm IWo l\pi‘M of (esls Mm following cmimdouilions may be of mleicsL 
(jil Tin* older Mutisfn leshurc valid ondei much mows goneml conditions than 
tin* / h h( 

(M The ttidei M distie (esls are almost us olheicnt ns live (-test foi small sam- 
pli s fioin u noruml population 

fid rlii* oidei statistii I es(s ai r mole easily computed than the f-Lcst 
id) Imi Die ui^ 1 of a sample fiom u nomml population and neui significance 
the (-lent given mute lufoinudion than the mdci statistic testa 
In Mime euM's a *<ot of ri imlependeiil observations satisfying only 1) and 3) of 
I'niuhliniiH (A) uiii hr* linnsfoimed into obseivnlmns uppiovnnately satisfying all 
of umdiluuiK (A) by an appiopieito umtmuoun monoLome change of valuable 
F«u example, irplncmg each ohvmuluiu by the logmilhm of the value of the 
uIim i valnin sumelmien Moults in a hc*l of observations having uppioximaLoly 
ininotrun! ihsfiihiUinns Since the tiansfoimaLion, say g(x), is continuous 
and monnlome, the Handling oliseinations wd\ have median y(4)>) if the ungmal 
ijliH'i\utioiH have median Confidence mtoivals can be found foi <£ by Inst 
obtaining tonfirhuee mteivnls foi g(0) on the basis of conditions (A) and then 
liixejlmg Siginlie.inee tests can lie oh lamed fiom these coiilulciK’c intervals 
Tim li*sl m of lbut I ( m he applied to furnish genciallied solutions foi scvcml 
ell known slated u al puihlems Some of these applications mo given in unolliei 

, l« i r (11 

One application minis m i uses uhoie Llieie is reason to believe that iondi- 
lioiiK f A) file «ilHn d but llune is no reason to assume that the populations fiom 
which (lie uhwivahuns weio diawn me oven appnminalelY the same Peihaps 
tin* must common mlunlnm of tins type i*^ (hill in which the value of a ecrliun 
quantUv ih expei lmenl'illy deleimmrd by soveial dilToionl melhoda, all of which 
should iheovetieallv yield tho amne robiill 'then theie is no \canon to believe 
1.1 ml all the experimental values have the same piociaion It may be ponnigsible, 
liowt^ver, to ussumo that each value is an ubseivntion from a continuous sym- 
metrical population and that all the populations lmvo fclio same median Then 
the older statistic tests cun be used to lest the true value of the quantity investi¬ 
gated Foi example, utiisidei the clctai ruination of a specified physical constant 



66 


JOHN E WAliWI 



jO 


o yi cr 
o or n r 


^1 -T 

h h 


ess 

ess 


^ «-d 

V V V 


tf ft *f V 

-l" 1 I ^ 

' I? ? 


H 

M|rt HIM Hd 


* 

V 

(1 


i * h 

fi § a 


* h 

T 

/ 


CC 1/5 CO \z 00 
CO 'O Cl «-* o 


fO U PI GO vr 

M 1 <N n o o 


CO 


V V 5? r/7 

- * C D 


$■* * 
AAA 

0- 

* A i5 

■H U A 

r< »- -d 

N H S J. .** 
^ ■ H 

f-d ' I **r| | 


r* 1 ia * 

H (f H * 


o & & 6 

A A A A 

h »* N H A 

H M I 

1 I ^ h Kj 

*" + «1' ■ Hfin n-4 i 


hi 


^ » 


*■ 


s *■_ 

c V c 


i .*' ■“ 

hi w “I » 


'- / 

c* p 


e p 

vi r 


V v V V 


• t 


H >? |7 $ V ! 

I i I I j ; 

■d , *1 


hf h H H 

*<(f * **n «nH »«jfi 


+ 
ft 

L * 1 ' ,*n » ->A. 

rt *i h h „in 

^ sf $ ri 

eSHE 


pi to S3 oi ar* 

t i 

O T *■) »-, o 


h N O o ^ 

1 » i 

ir Pi -h o C? 



< R ? -r + T -r)V ' z J-]tirm I > k“»x 4 - * 1 )* 1 *x]vcus 0 1 ’ 9'0 

<*0 < [(*j -f >jt)= * cxjuini \ > {{^r -j- *x)§ fix]\-eoi X‘5 I'I 

“d < k»X 4 lx)«' £ *x]uiui > k° r ^ ■— q z)l 1 iXjVEOl ^ C C £ 

< K :z 4* '*)\ * > It. 01j + **)£* strjwn E IX 9'? 


SlCiNDJC VNCL TLS 



4 7 9 4 max[^(xi + ij 

2 3 4.7 max[|(x E + £] 

1 0 2 0 max[J(x 5 + x 3 

0 5 10 max[x n , I 





















68 


JOHN E WAL&II 


Yanous scientists obtained expcnmental values foi tins eoiiMnnf by m*\ ora dilTor- 
ent methods If it can be assumed that each value is an •ilN-nnhoii turn u 
continuous symmetucal population and that all the |» 4 niliiCi«iii-. him* the -who 
median, the Hue value of the physical constant can ht teste,1 In nppb mg Urn 
order statistic tests to the totality of the expe.imcnlnl value- 

3 Power efficiency of tests. A pioblcm which arises lluoiijdmnl llm pup»-r 
is that of determining how much mfoijnalJWi u IfwL l>v umiik miiiio oilier fe-l ju 
place of the most powerful test of a given hypothec lli(‘ iiwi-qm* 

of the amount of available mfoimation which is used l,y u lest u ill 1 „■ «n ni us a 
peicentage and is called the pow f ei efficient y of the Lest, oiiMdcred 

In all cases investigated the undeilying population is noim.il willi unknown 
variance and the hypotheses tested concern the population nif’dj.’in tJiirviii) 
Then the most poweiful test (one-sided oi symmclni id) is Liu* u]iprci|iriute 
Student f-test 

The proceduie used to mensuic the powei efficiency of a H“d is diflemd from 
the common method of measuung the efficiency of tm (Stimuli* Tin* elluimu’j 
of an estimate is obtained by taking the ratio of flic wuwme of an elli, « nI esti¬ 
mate with lespect to the vauanco of the given osliinido 0'\puw*il ns u per¬ 
centage) The method of determining the powei dhtioiu'v of a I <*■>(, liow, \ er, 
consists in continuously vaiymg the sample size of the appMipriulc nm-l powerful 
test (same significance level) until the power functions of the given I, nt and I Im 
most poweiful test aie equivalent in the following sense: ’lhe run hr tw, t ri tliu 
two power ouives for which the power function of the most pout t fill IM wmls 
the power function of the given test is equal to the aimlogoiih ure.t foi v\ 1 11 * h llm 
power function of the most powei ful test is leas limn lluil of the gi\ mi I osl (U 
is assumed that the power functions of the tests can lie made to depend on llm 
values of a single parameter) The sample aizu (not iirrm<m/jy mUynit) of ( h e 
wosf powerful test with equivalent power function divided by the m mpb of the 
qwen teshs called Ihe power efficiency of the given fesi (tx/irmu/ ns a prmutwjii 

In obtaining pow'ei efficiencies in the maimer defined ubow, the winipli 1 mku 
of the most powerful test is allowed to assume uon-mlcgiid valium This fur¬ 
nishes an interpolated measuie of the same size of the most powerful to-q w hi< ti is 
Power function equivalent to the given test As ponded out uiiovi*, tin* l test 
is a most powerful test foi the situations cotihidm cd m tins paper A mel IukI * if 

computing power funcLion values foi /-tests having non-mlegiul Humph 1 soil's ih 
given below 

The definition of powei efficiency selected n very com onirnl from u i'omput«* 
tional point of view Powei function values for the t tosl nm hi* easily nmtpnlvd 
through use of the normal approximation given m [2). For Iho MigiulinmiT hw eU 
considered in this paper, the normal appioumuliun is icuwnnbly aminito if 

e sample size is not too small In the remaining cunt's llm approximation 
underestimates some power function values and overestimates other* Fur the 
situations investigated, however, the error mtioduccd by Hus combination uf 



tahu*: 2 

I.ffu a ncu s and ]{ow< r funition values for certain order slalistzc tests 


Siginfii iihfp IohL 

biiTnjjlc 

Si/r 

i 

Apnrnv 

Hh 

Sigiiif- 

iriincc 

Viiluca of Puv, cr 
Function 


f'iciicv 

I eve) 

— -- . 

- --- 

;—•- 




i = G 

tf => 1 2 

fl « 1 8 

- - —• --- - 



_ 





l 9 

V* 

/V 

0025 

337 

,755 

961 

1(U l J >6) < fa 

5 

m 

0025 

343 

.755 

, 958 

l 

n 82 


0100 

327 

779 

980 

mu\[r 51 + j fl )! < fa 

(i 

97 

0160 

334 

779 

1 972 

l 

3 88 


! 0312 

214 

682 

951 

i(‘s> “1 r B J < c/m 

<i 

08 

0312 

254 

, 687 

, 912 

t 

6 05 


0517 

406 

809 

994 

mn\[r*, $(ri 4 *,)! < fa 

7 

05 

0517 

113 

.867 

i 991 

t 

(i 85 


0231 

239 

.716 

1 

i 969 

mu\l.r c , iOs + r?)! < fa 

7 

08 

0231 

249 

717 

l 

1 962 

l 

7 55 


0130 

395 

882 

996 

, Ifr* ^ r R )] < fa 

; 8 

04 5 

0130 

101 

.879 

993 

l 

7 85 


0117 

174 

650 

956 

]11UX[JC| t ’I- J«)j < fa 

8 

98 

.0117 

185 

650 

,949 

t 

8,61 


0215 

.302 

.839 

094 

, i(r 6 -h tq)1 < fa 

9 

90 

02)5 

311 

834 

.990 

l 

8 6 


0059 

.127 

,597 

917 

ina\[rK, SCo + Aa>] < fa 

9 

99 

0059 

137 

599 

935 

l 

KB 


. 0547 

450 


998 

•Tfl < fa 

■ 

75 


454 

901 

995 

l 

0 05 



,227 


.991 

, 4(c 0 -1- rin)] < fa 

10 

1)0.5 1 


,237 

780 

086 

t 

1 



IvTl 

GG8 

901 

Dl!l\[jrg , H-fl r J»))J < fa 

EH 

80 

1 

HU 

1 

.077 

952 

l 

8.9 


,0059 

.141 

621 

! 951 

Alt) ^ $0 

n 

81 

0059 

.152 

.631 

942 

l 

EE3 


0102 

277 

870 

998 

ma\[Ao, £(*« + *ij)] < fa 


93.5 

0102 

288 

8G2 

995 



















70 


JOHN L WVl.su 


underestimation and oveLcstunatioii lends lo cancel out in Hie delormiiuhon of 
powei efficiencies if the above aiea definition ol rinmltlj of pimer lum lions h 
used Thus application of tho noimal appioMinulum m- 

eurate power efficiencies foi the cases considered m this pipnr IV of Hip 
noimal approximation finishes an easily applied method <U uhtuimummwor 
function values foi (-tests having non-integral wimple sizt’i 
Table 2 contains examples of the above dcMMilwl inKliud of deter mining pun nr 
efficiencies Heio the powei function values foi llit 1 1 lesl imr nmi|mh*d using 
the normal appioximation Examination of Table 2 hlums Unit the inn\mimn 
diffeience between conespoiidmg power function values fur the luu typos uf 
tests is small for all the cases considered Iheia This holds in tin* ilelermiimtion 
of all the power efficiencies listed in Table I 
Investigation indicates that tho definition of power cflinenn given lieic is fur 
all practical purposes the same as that given m (3) 

For the situations considered in this paper, it is nufUriout lu riMlru t power 
efficiency investigations to one-sided tests, Every synunelrn lest m\ vdigulod 
can be considered as a combination of two mm ovoilupping uhr’-’udivl UmIs, 
each having a significance level equal lo hnK that of Iho s\ mim (rn* te^l Aki, 
from symmetry, these one-sided tests (each conaidoicd ns n M*pimitn ti*A,j have 
the same power efficiency, Thus iL la an immediate nmsmpumi e of tin* dWiiulum 
of pow r er efficiency that the syminotuc test haa tho Burnt* rllu jouey ns ouch uf iho 
corresponding one-sided teals at half tho significance* level 

PART II DERIVATIONS 

4, Introduction. The pm pose of the remainder of the paper in lir pu'**eul 
derivations of the significance test results sLated in eecliuni 1 and 2 Tlu* first 
denvationa consist in obtaining confidence intervals for # mi the Iuihh uf *emnli- 
tions (A), Then properties of theso confidence intervals am mialyised Applica- 
tion of the confidence intervals and then properties to aignilinuuH* tests furimhea 
many of the results stated in sections 1 and 2 The remaining deriv alums tiro 
concerned with efficiencies and the general method mentioned in ftcelum 2. 


G. Derivation of confidence intervals. Let us consider ji independent ob¬ 
servations, each observation being duuvu from a possibly diffpr&ut population 
Denote these obseivationg byy lt , y n and let Iho ctif of {/, be given bv l\ , 
(^ = b , ft) Fuitheimoie let the ft populations from winch iheuo n ob¬ 
servations were drawn satisfy conditions (A), Thun 1} of condition* (A) re 
quires that each F, is continuous, while 2) and 3) stipulate that 

I.^ = / dF ‘(y- - <!>), (i - i, ■ , n), 

“ C 


<*> 


for all values of c m the interval - <*» < c < „ 

%’ *1*" ™, preMnt Vl ' ' > V- Miftngcd in unmaking order of mugnt- 

tude. Since the cdf s are continuous, Pr(x, = Xi',i & j) ~ 0 For Uic mUm- 



SLoMHC \\f K USJS I OH mi Ml DI IN 


71 


tioiis Ircall'd m tins pipci, it m sulhcicnt to coiisidci one-sided confidence inter¬ 
vals for •}> All our kick'd eonhdenip mlei vtils denvpd have one of the founs 

j £/( l L l ) T*n) *S 0, 

^ 0, 

where g and h uu* Hotel mensurable film lions of ii, , t„ such that 

JMyfai. , u) < 0] ==> JMp(n - 0, ■ *, i\, - 0) < o], 

JMACri. , **1 > 01 •= I>,[h (n - 0, • , ar n - 0) > 0j t 

(\mmdei the additional emulUion 

(H) All populations me the same 

In leinis of cumulative (lisLnlmUim functions condition (B) lcquucs that all 
Lite clips l\ uto equal to *mmo cdf F A theounn will he piovcil winch shows that 
all eunlidence mLeivals of the forms (I) doiivud on the basis of both conditions 
(A) nml (H) me uki valid if only conditions (V) iiecessauly hold, ue if 

, « - , a„) < 0| p 

whenever Ji, , are order blutisUes of observations from populations satis¬ 

fying conditions ( V) and (B), Llien Uub piohuhiliLy expiossioti also has the value 
p if n , , x„ me fiom populations necessarily satisfying only conditions (A) 

Similarly for Pr[h(ii , , in) > 0] 

Tin uiiUM l Let Q(x\ — 0, ■ - , x n — 0) be a probability statement involving 
X\ 0, , ~ 0, u’/uefi defines a Borci inctmirafdc reflum R(&\ — 0, • , % n -0) 

of f/ir n-diiMriisiomd order statistic spacr // 

(2) Q {?i - 0, ■ , 2* - 0) “ V 

whenever ti , > • > , x n are order statistics of n independent obsemhons from popula¬ 
tions satisfying conditions (A) and (II), then ( 2 ) nhso tioJds when nil, , are 

order statistics of n independent observations from populations necessarily satis¬ 
fying only conditions (A), 

Pmior. It is HulfieienL to consider the case in which 0 = 0. Then, if condi¬ 
tions (A) arc flatisfied, the joint piobabihty element of Xi , ■ • , x„ is 

dFCti, *, Xn) ™ 2 dI'\U i(d) ■ • • dl‘\{x ,(„)), 

where the Rumination is taken over all permutations ir or the integers 1, * , n, 

ami F'ti uro cdf’s of symmetrical populations with aoio median. Let R « 
j • • t x*) be the region of the n-dimcnBionnl ordoi staliaUo space defined 
by the probability statement Q(%i i 1 * • , x n ), Then Theorem I stipulates that 

[ dF{x u 
Jn 


(3) 


■ i i 


i 3?n) — P 



72 


JOHN L W \I/UI 


whenever tfl , ■. f me from popululiOM wUwfyiiiK conditions (V) Mid (It) 
with zero median. In Ibis case, however, each F t = f' und ( 3 ) liotomrM 

<4} #! f II dr(ri) = p, 

v ' Jr i« j 

where F is the cdf of a population satisfying conditions (\) and (H) wi(h mo 
median, Let 

p - II (t mm) 

and define S* to be the sum ol all teims m the cspiuiHum of V winch cnnhiin a 
specified a of dFi f < (IP , and no otheiB, the particular net iWcii is d< liuicd 

by 0, where p = 1, - , Q Then 

P = i'’fo, ■ > 'tj + 2 ^n-1 + ■' 4' Z »Sl 

0 a 

Nowronsidei any given (i e a, ft given) JJcfinn dll In lie the Mini of flio 
a of dFi f , clF„ pei tinning to ft plus any sol of zoio oi nmre of the remaining 
dF } a Then no mattei winch of the remaining <1 F'h me rlmMui fur dll. On Mini 

n 

of those terms m the expansion of JI winch contain the purlmilm ivt nf 

i-i 

a of dFi , - , dF n is always equal to <S« , Let 

P. = L (il dGi(x ,)\, 

where dGi equalg’the sum of the a of dFi , ■ , dl*\ pel taming In d, Then from 

the above and the symmetiicnl fashion m which the dF’h me (rented, 

p a = r * 4- left z A-i 4- + /V, fr0 z ti , 

where the IC^ (u = 1, , a — 1), aie constants 

Consider the case m which a - n - 1 Uuing tho above r\pre^iou for / J rt ( 

P^dF{xi, f > ,x n ) 4- P n -i 

+ f 1 ^ “h ■ 4“ (l — ff{* ’) 

Repeating this proceflme successively for« = n - 2, n - If, - - • , ] hIiuwh that 

, i.) - P + CWV, + .. 4-CiJV 

where the G v , (v = ], , n — 1), £u Q conalanls, 

Since each F, is the cclf of a symmctiical population with /mo median, 

0 1 

GJ<x - « (sum of the a; of F t ,■■ , l<\ povtaming to ft) 



MfiMHf \\< I- 11 SIS foil XIII MiniVN 


73 


1 -wiKn HhmcIi for a I'omunuiiis sjmmehiuil jmpuliiiioii nilh ^ciu modiun But 

K ' «* t) " 

Hiiur Ah (f|, , is oquul In a sum uf teams (multiplied by cm bun eon- 

hlanls) uf the bum 

ttiiidHo, 

f-i 

mIioh* F is (lie filf of u ionlmuuus symmetinul population with mo median, 
Tims bum (I) and tin* Imuai piopeiUes uf Urn uilegiul, 

I AF(x ,, ■ = p 

if Vi i 9 Vh urn finm popul ilmns nccossiuily Ntlisfying only conditions (A) 
Q o d 

Ni'\1 i onhdonco mlm als of ihp fin ms (l) a ill be deuved foi 0 on the bu&is of 
conditions (A) and 'Ub Hofoto sliding the theoieni on wlueh these couhdenco 
iiitcivnli me based rnnsuha ibe following dpfiniLiun of nnUilioiv Foi each per- 
iiimiIiIv sold him of l ami j lJu i^inhul 

(b j| (1 < i < J < n) 

druuli'H mi firbifranr but Jtud selection of one m both of the inequality signs 
> The Hold (ion of both moqmiliLy signs, denoted by 5i Inis the mteipic- 
(iliU)ll 

r, ff 0 ZJ> — CO < T, < eo 

(j, 1 Xft/2 Z <f> - CO < (t, + Xj )/2 < 00 

It ia lo bo noted tlul lr, .s| is not neoossiuily equal to [i, j} unless r = % and 
a « j 

Tin mu \i 2 Commi/h the probability htatmcnl 

(5) Vi)/2{bj!0lf <i<J<nI 

Lrt this Wufemrad /min' ihr lalur q if i|, ■ , i„ me oukr statistics of a sample of 
si-r n drawn from the uniform population wilhuwge — Wo 1 (Me n 0 « 0) Then 
(ft) ii/no Jmm (Ac wiiiic i; ijn, i u uic orrfci ufafisfics of a $amp(c size n drawn 
from ann fM/iufrifuut satisfy my rtmdifioiw (A) and (H) 

Pjwoi Lot 7 /i , • , ij n bo a wimple of n valuos from n population satisfying 

lunditioiiH (A) and (B) while u, - , i’ H me the y's m ranged in meieasingoiclei 
of magnitude. Then llieie m a monoLono function tt (see [1]) such that ? r[z) will 
have the wnne edf asp, — 0 if z 13 h om aunllorm population with mnge - j to k 
Since the j/ J s me burn a symmetrical population, — tc{z) ** ir(- z). Lot t, — 0 — 
ff ( 2 i)» (1 1=5 lj ’ ■ , a), dcline the z, Then 



74 


JOHN U WALSH 


Pr[{si + x,)/2(»,jlrf - MM 2 ') + ■ ,|t)| 

= /V[lr(2,)|l,jl ’ fM 


From the monotone and symmetrical properties of the fun. l'<"> * 

j) - ir(r,)) = M*( 2 ‘)l>, 

“ Pr(z.|Ml “ 


By hypothesis tins last expression has the value q, Urns <nm«t>lctiii K < >«■ I'r-Mif 
Many of the probability statements of the form (■>) l'«Vo «vn> pruhiiltilily 

For example, Pr[x t > 4>, z, < <f, ■ 1 = 0 A, '» w ’ llM llBBS ,,f ll "‘ h >'" lt '" 18 

[i t result in equivalent probability statements For example 

* - ■ - ■’* 






An immediate consequence of Theorem 2 is timt one-anled roididrmm inter¬ 
vals can be obtained for by choosing any specified hwb'icl of (x ( |- r,)A 
(\ <i < 3 < a), and consideiing ati aibitmry but fixed older HtuhHtn* of the 
values of this subset For example, consider the eub&el mmsudinfi of i imd 
(a ; rt -2 + Then 

Pr[maxla?n~i > (tfn-s 4* #n)/2] <if| B fM(2, + £j)Alb ,?)$)» 


where 

f < if either i = j *= n - 1; or t « n ~ 2, j ** u, 

l*.j) “ L 

otherwise, 

In general, the confidence coefficient of any ono-wilcd emdiikwo mU'ivnl 
formed by considering a ceitnin order statistic of n specified subset of (*< 1 ,r,)/2, 
(1 < y < 3 ^ ft), can be expressed as a sum of piububildica of (lie form (fi), 
where (t, j) = £ if (xi + x } )/2 is not included in the apecifiod submit, (i < ;) 

It is usually preferable to select the subset of (an 4~ x t )/1, (1 < i £ j < n), 
m such a way that no two of tho elements chosen necessarily hdu" mi order 
relation 

Satisfactory two sided confidence intervals can usually be obtained an eumbuKV- 
tioris gf one-sided confidence intei v ala 


6 Confidence coefficients, Tho purpose of tins noclum )n to allow that all 
the confidence coefficients foi one-aided confidence intervals derived tin tho btiaia 
of Theorem 2 are of tho form r/2\ (r = 1, l), Alan u method of 

determining confidence coefficient values for one-aid ed imnfulonon mtmmlti U 
developed, 

First a theoiem will bo piesented which shows that each of the one nultN.1 eon* 
fidence intervals denved m the preceding flection has ft confidence coefficient of 

the form r/2 n ) (r = 1, > - , 2 n ~ 1), On the firm of Theorem 2 \i in mdln icnL 
to prove, 



SKiMI If \Mh TI SH LOU 1m, MJ MAN 


75 


'L’m <mi M ft Let Ti t , t„ he the articled valves of a sample from the umfoi m 
population with rtmpr — J lo ) Then 

PrWr t + J.,)/2U t j\ 0,1 < i < j < ti ) - r/r 

where r has one of the values 0, 1, ■ , 2\ {The symbol k j) is defined m section 

r>) 

,^m rm nt ruoRj. Tins theorem ia piovcd by investigating how the hyper- 

| lluiU’H 

J(.r, -f i,) = 0 (1 < i < 3 < n) y 

inteiseel the H-dumiimmid order statist!) t>pivce fov the paiticulm population 
ocmsidoied It is foiuul that each tola Lion of the foim 

5(ti + r,) fi,j} 0, (1 < % < 3 > n) 

delmcs u legion of (lie n dimensional oidei statistic space which consists of a 
icitain number r of Ji-dinionsiomd cells each of winch has an ?i-duncn- 

t *iumd I "volume" equal Lo (&)" A detailed pioof of this themem is given in 

l B] 

Ne\l n method will lie developed whnchy eonhclence coefiieicnt values can 
lie delm mined. foi any one-sided confidence interval of the foim 

(0) i(rf + zy) K;]4>, (1 < * < 3 < «) 

Foi this purpose it in sufhciont lo derive a pioocduio foi determining the con¬ 
fidence Coefficient of any eonlidmco interval of the form 

(7) mav [mtnin subset of 40ri + i'/). I ^ i < 3 < < <t> 

The confidence coefficient of any uno sided confidence interval of the form 
mm l] > t/» am he obtained by symmeliy The confidence coefficient of any 
other unc-mded confidence! interval of the form (G) can bo found by expressing 
the value of 

Pr + *i) lb o) <f>l 

na a sum of terms of the form /V(max [ J < <fr\ or ns ft sum of terms of the form 
/V (in in [ ] > 4>|. That this is always possible for one-sided confidence intervals 
of tlio form (0) in shown by dnocfc application of the results of page 37 of [G], 
It it) not difficult Lo ebon that any tme-eided confidence interval of tho form 
(7) am bo expressed m the foim 

mu\ t^r(?i — k ), + 1) T ;r(n - ntt> — k -T 1)] ( ' i 

^[s(u) + - Wi)]| < 


where 


n), 


4 


x(i) *= Xl, 


(* = 1,’ 





76 


JOHN Li wu/ni 


and mu' , mi me h integers aach that 

ft > fth > tnt > ■ ■ > m* > 0 , 

This ia done bj' 1 choosing l, mj , , m* bo that the two confidcnro mtm ala are 

equivalent. 

Tims it ia sufficient to piovo the following theorem* 

Theorem 4 Let ®(1), , u(n) represent the ordered values of n mtlrpaulod 

observation s draivn jro»i populations satisfying conditions (A). Cfimv' a Art of ^ 
integers m\, , nn such that 

a > mi > fth > ” "> oh > 0 


>o, 


( 8 ) 


'] 


Then the one-sided confidence interval 

max — k) t ^[a:(w. — k d* l) + s(a — in* — L *p l)j, , 

( 3 ) 

+ *bi - aq))} < tf>, 

where a term of the form - h +■ 1) -f :c(?t - 3>u — A -\- 1)], (h =* 1, . 

is lo be deleted if 3 n — ms — h + 1 = 0, 7ios the confidence coefficient 

[ nia mi 

i + tfti + 2 6 »i - ti) + X! 2 (»h - ii — i' 3 ) 

11*1 (jwl {|mJ 

mjt uiA-i— 0-1 n i — U --1 

+ ■'+ £ £ ■ • s Cm,- 1 { -.■ ~ u ,) 

'i-a-L n-l ' 

Sketch of Timor It is sufRciciit to consult 1 ! ihci hm* m ivhirh ihi« n obaorvii* 
tiona aiea sample from the uniform population with rnnge (o \ (then $ ><()) 
Let ua consider the legion of the yi-dimoiisional older Klalialii* apaco tloHnocl by 
(8) This legion can bo considered as an intersection of n-dmu-iwmiml icgaiim 
each of which 19 completely defined by a cerium region in mi x f , xi phum 

(1l . < 3 - n \ Alao thc ^-dimensional 'Volume’ 1 of this region rquabt tho 
value of the confidence coefficient of the confidence interval (8). 

By Theorem 3, the interaction region of (8) consists of u certain nunilK-r of 
basic cells, each of ^dimensional ‘Volume" (}) n Theorem 1 ih proved bv 

k eth0d t r "" lmg the numljw oI cells in Una inlurmi I inn 

fwT ° f h ° eo,rca|Wndl,1K ,0 ® on « ln U» r.. i, pliuies, U w fnim-j 

that the intersection legion consists of 


m fe 


1 + fth + * +2 

**-. L —1 


.■X (llll — l!, “ * » * 




“ U 1 ) 


p CCVlS ? ^ detailed derivation of this pvjmtmmou is given in |3J, 
m L u tho application of Theorem 4 u[ H M {{ 

vai ' ^ ,Bl9 ' Then, by Theorem 4, t!\o immautcd cLmhdiaico inter- 


__ mai ' t 11 > K 1 ® + * 7 ), l(t|Q + I,)] < $ 

> For the tnvul oasc m which t - , t the v„|uo ot (0) „ unlly , 



StffMHt W l 'JISIt 1cOH Till. MlDf\X 


77 


lias a eonhdem e < iii j I1h nut equal to 103/2 11 If n = 12 instead of 11, the con- 
fideiu o Mii-llii ii id Mould In 1 103, 2 ,: while the confidence nitoival becomes 

niii\ |i», ifn,, H- j s ), J(.m 4- .to), H^n -I- ^01 < <t> 

j\s another esample, lr>t u - 11 and consider the confidence intei vnl 

M*i\ I'a , jO'a I" f7), ^Ckio 4* a’fi)j iGtu H- aj)] < 

IJoir / 'I and "oiji[Mii-nn uitli (8) shows Unit ting confidence mlcival satis- 
hnl flu mein l mih irh 7, ))h = u, oij *= 2 Thus it lms a confidence uocfR- 
i ii nl rqu il ft> a I 2 U 

Thi'omii 3 41101*1 ihnL (M(h one-sided (oiifideticc mLeml developed on the 
lu-ii nf Theun m 2 has a ronlideni c coefficient of Ihc foun r/2", (0 < r < 2") 
The questum au-cs as u» whelUci the one sided confident 1 ? mtcivnls defined by 
'llu mem 1 ha\e i mihdeiu e i oi Ibni'iits which uLUiui eiieh of the values l/2 n J 
i> 2", , (2 Jl - 1 )/2 n Thai tins n not Iho case is piovcd ns follows, The 

I ol ,dil v of diflmnit rniilidoi't'fl mlmuls of the fmm (8) if? equal to 2" — 1 This 
H shown liv romiung how jn,*ny ways the integm ??q, , mu can ho selected 

buhjecl lu the miidilnjus n > »h > »ia > > ?n; > 0 It 19 easily seen that 

then*are pic-hiblo ways Summing uvci the possible vuIuch of l yields 

2 fi I This (iRUie is nn leased lu 2 h if llio confidence inleival < <f> is also 
Jln , imliil ] b\ann mil n m of ([)) shows, how even, that two difTeicnl selections 
of i/ij , im , etc , will 1 esiill in the same vnluo of tO) for more than one case. 
'I’lnii (he one sided eonlJilenee mlmuls of TUeoiom l do not have confidcnco 
eneilM leoH ninth utL .1111 each of the \ nines i/2 n , , (2 n - D/2 11 

Although the elans ot onesided mifulenrc intervals defined by Thcoicm 4 do 
not have eoiiltdem c* coefficients which ulluin each of the values 1/2 > 2/2 * 1 

(2“ - 11/2", tliej do fiu\e aiiolhei pioptnty which is impoitant fiom a practical 
poml of view If a cerUm t oiifidcnre coefficient can be obtained foi a paiticulai 
value of n, then this confidence coefficient can also be obtained for all greater 
values of ?! Tins lfsult is a consequence of the following thcoiem 

Tin out m ft Lcl x0) ( ’ 1 4 J| ) be die ordered rabies of n independent obserra- 
lions (Irmcn /ram peptihi/ums conditions (A.) Then tf a confidence m- 

In ml tif the form (8) ha# the confidence coefficient e for a an tain value n Q of n, it 13 
ahuun to obtain analiur confidence internal of the form (8), which /ias the 

eonfulnirr coefficient tfor the mine no 4-1 
PiiimK. Let n/|, • , hi * be the iiilegeiHCOiieHpoiidiiig Lo Lhc given confidence 

111 I 01 \ id of foim (8) Tbest* mlegeiH witisfy the condition 

?io > wo > ?/ts > • oh > 0 

L,a , la be ifpliiccil by «. H- l ami cons,Jot the neu set of mtegeis (m, + 1), 
( 1 , 1 - Hi),' , -t- 1). *■ Evidcnily 

n, + 1 ^ ,; (i + 1 > 


> ,m + l > 1 > 0. 



78 


JOHN F WALSH 


Hence these integers can be used to define a confidence interval of the form (8), 
Also it is easily veufied that 


mj-H 

1 + (mi + 1) + £ (mi 4* 1 - *0 

ij-i 

rfil+I ntif-l-lj- "'K. I 

+ + E • - 2 + 1 ~ !i "■ 

'i-i 

1 ift iffi+l'-H*' ♦“h 

+ S E D ("‘i - i - «» - • ■ 

fa ‘i-i 

mu 


^ U j) 


- n) 


- TlJ 

= 2 1 4 nn 4 £ (flit — t‘i) + ■»• + 2 

<i-i 0_i-i 


Blj-ll- — < * — 1 

S ( J,, l “ ll — 

I 1—1 


<1 .)] . 


Thus-the new confidence interval has the same confidence coefficient an tho given 
confidence 'interval 

From.symmetry considerations, the onesided confidence! mLmul 


jmn \x(h 4 1), 4 4 ^)), ‘ , Mr(l) 4 *(»h 4 1)]) > f>, 

where a term of the form %fa(h) 4 + A-)], (k ^ 1, , A), is lo In* delrMxl 

if Ttif, 4 h = n 4 1, has the same confidence coefficient im Mm nm 1 sided con¬ 
fidence interval (S), l e, its confidence coefficient ia Riven l>v (0) 


7 ^Efficiency of some tests based on conditions (A). Let un consider Mm t msc 
in which the n obsei vations used foi a test arc a sample from a normal population 
with unknown variance The purpose of this section is to investigate Uni Hli- 
ciency of some teats based on conditions (A) for this special eimo 
The method used to obtain efficiencies is outlined m section 3 Only oiiV-mili'd 
and symmetrical teats are considered, Foi this purpose it is sufficient to limit 
investigations to one sided teats of <#> < 4> 0 

If the subset of K®, 4 #,), (1 < % < j < «.), qhoseu for u teat ia not tif one of 
the forma 


V*/ *1 


(t < J), 

(i < J < k\ 


(b) + a/), 

(c) + fffc), v - ^ - *vp, 

the determination of power function values requiion u numerical double or liighor 
order integration Such numerical integrations aro extremely lengthy For 
this reason only one-sided significance tests based on snWlB of liio forms (n) (c) 
will be investigated 

Let the normal population have variance and consider one aidod twin of 
^ < <A) based on subsets of the form (a) Then 



H1GMHCA.NCB TfcSTtf FOIl TIIL MUDIA.N 


79 


Powpi Fumlum ~ /V (*, < 0 O ) 


where 





a / i-l 


nin-hri W)*(1 - m) 


6 «. ($o — 0)/cr, #($) = J e iy1 dy 

The power font Imn vuIucb listed for the test a\ < 0o in Tabid 2 were computed 
from Lbe nbuvo exprewuon, TJie coneapondmg values foi the Meat wore com¬ 
puted from Urn normal approximation given in [2], 

For subnets of hums (b) and (c) the expulsion foi tlio power function is moie 
complicated and will not be eithei denved or staled here, Foi any particular 
case, however) a simple analysis Mill ywkl an expression far the powei function 
which iccpmeH only a first aider numerical mlcgiation General expressions 
fm the powei functions when the sublets arc of the forma (b) and (c) are stated 
and deliver! m )5] 

Table 2 con linns power function valuea rind efficiencies Tor seveinl teats based 
on BUliKds of the forms (U) and (c) The poMci function values were computed 
by appioximufe mtegriUKin (Simpson’s iu)c, otc) The Meet power function 
values were obtained by using llm noimal appioximntiori The power efficien¬ 
cies listed in 'fable 1 fm lesls which do not appeal in Table 2 ivcic computed in 
15), ft hen* a table of powei function values is ul ho given 
JvinmmnLnm of 'fable 2 show's Unit many of the tests formed fiom subsets of 
types (b) and (e) are vciy efheient foi small values of ?t, TJie efficiency appeals 
to deeieuse as n increases Also the cffieirncy of a test depends stiongly on fcho 
subset rif d- r f ), (1 < i < j < u), lined to foim the test For exmnplo, 
et n =» 10 The lest 


Accept 0 < 0 O ?/ max [r», + iio)] < <j>a 

baa a significance level of approximately ,01 but an efficiency of only 82%, 
However tho iCHt 


Accept 0 < 0o if maxima, + ^io)] < <k 

also him a Mgmficnnce level of approximately .01 but an efficiency of 90 5% 
An uppioxiumto set of mien foi picking subsets which lesult m efficient tests 
of 0 <0o is suggested by the lesulls or Tublo 2 Lot r(zi), - • , bo tho 
order wlnlisties which makeup the olementflof the particular subset of i(.ti -f- £/), 
(1 < & < J < n), to be used for the test Tho approximate rules are 
J Uae the maximum of the values of the elements of the subset 
2 Choose ii, • ■ , i r so that mnx(ii , • , i r ) a ft and mm(ii, , t f ) is aa 

laigo as possible subject to the icstiicUon that the toBt is to have a sigmfi- 
canco lovel of a specified order of magnitude 
Symmetry considerations furnish the corresponding set of rules for obtaining 
efficient Lea la of 0 < 0o. 



80 


JOHN J WALSH 


Other testa at appioximatoly the same significance lowl . lull nut Imv.I mi sub¬ 
sets of the foims (a)'(c) a.e undoubtedly more effii mil than nvmv ul Ihi tests 
considered in Tables t and 2 (pm ticuliu ly for the linger values of n, (iiiiipula- 
tional difficulties, hovvevei, prevent considcmtioii of mine geneiul sitii.ilions 

8- A general solution,’ A general method of obtaining one snlnl (e.t. of 
^ < (s, and tj> > fa , also synimetiical tests of </• 7* <fia > on llio basis of eoiulil mils 

(A) is the following 

Let vt, , y« be n independent observations drawn fuun pnpuUlmnH sihv 

fymg conditions (A) Let 

Z t *= Vt - fa (i ' 1, ,ftl 

If the null hypothesis of =- fa is satisfied, each z, is ail obwrviilioii from a popu. 
Iafcion satisfying conditions (A) with zeio median. Consuls the 2" md s of \ nJurs 
obtained by the trangfouxiations 

z t -» *(*)*!, (*-!»» h) 

where «(t) is one of the signs + or —. Form the moan of cut'll of the 2* eels of 
values Then it is readily seen, from conditions (A), that the probability (1ml 
z(- Szt/n) is less than the (r + I)th largest of the 2 n nw ms ha** five Mvhie 
rf 2" when the null hypothesis is true Similarly (lie piolmhilily (hut „ is tfn'atri 
than the (2 n — r)th largest of the2 rt means is equal Lo r/2 n if the null lij ptilliens 
of 6 = fa is satisfied Thus the test 

Accept^ < fa \fl is less than the (r + 1)/A largest of the 2" mtann 

is a one-sided test of <£ < fa with Biginficancc level equal to r/S”. Lihowj-M* (he 
one-sided test 

Accept i p > fa if z is greater than the (2 n - r)th largest of the 2 n means 

has the significance level r/2 71 . Consequently the symmetrical Utft 

Accept ^ fa if z is either less than Ihc (r + l)(h largest or yet a Ur 
than the (2 n — r)th largest of the 2 n weans. 

has a significance level equal to 2r/2 n 

The application of any of the above tests requires the compulation nf the 2" 
means and a determination of whole i falls in the ordciuig of L)hw* nuvtmi. If 
11 = 5 = only 32 means need lie computed If n ** 10, however, 1021 mvmio must 
be computed. Evidently this test is too cumbersome to apply u\eept fur v 
small values of n 


9 Acknowledgements. The author would like to e\\ m-w his uppm mlmn to 
rofessors S S Wil ks and John W Tulcey for valuable advice tuul insist uu(‘a m 

r!n ' Th f l Vl Utl0n T donvod lndopemJonfcJy b> E J. G Pi [nun and ll.o uutli-ir Tim fun, 
idea on winch tlic solution « based ima preaentod b\ H A. I islicr in |7J. 



SltiWIOAM'L TlST-i 1-011 Tllll MDDIAN 


81 


tin* piopuralm» of llns puin‘1, aKo lo Mrs RuUi S Slutfct foi computational as- 

SVilftWl 1 


HKKHtKXCEH 

11} John l V> u m, " VjijiliintinnH of flonn* HiKinlirjuico tcsU for tho median winch me valid 
under ycij gi norn) conditions,” Nulnnillod Lo Am Still Anarr Jour 

[2] V L JnjiSMis wn H I, Wliiil, "Applu nlioiiH or Llio non central t distribution,” 
luovhtrt) u, A oi -11 (ll)J(J), jj f)70 

[ l) John 1- Womi, "On Llmpimci film lion of Ihofugn LoaL for slippage of means,” Annals 
uf Math Sfat , lo! 17 (101(1), pp 3fl0-J01 

(l| II fM irml ami .1 \Y Tdkm, "Non pnruinclnc estimation I Validation of order 
Hi iLihLim , 11 of Moth Stat , Vol 10 (1915), pp 187-192 

[51 Joiiv 11 W\fj-*n, "homo significance teals for the median which me valid undei very 
general coil ill lions,” unpublished thesis, PnnccLon Umvoisity 

(Cl G Uhny Vuik ami M TJ ICendaUi, An rnfrorfuction to Ike Theory of Slattslica, Grifha 
ami Co , 1(117 

17] U A< l 1 turn k, The [)C ‘uflu of /■'jcpcruitciife, Oliver and Bajd, 10‘12 



A DIRECT METHOD TOR PRODUCING RANDOM DIGITS IN ANY 

number system 

By II BunkE Bouton and 11 IAm-.u S^mi HI 
Interstate Commerce Commission 

1 Summary A compounding technique hrrtl used to piodunMiuidiim binary 
digits is genrnlM Mid extended to other number avslenw I'uriiUll'ti' fur llic 
rate of convergence of probabilities to the den,rial whim ..re . < ni.il 1 ho 
method is extended to the pioduction of uindom digiti mill liv'd lull un.H|iml 
probabilities Numerical usulta are presented lu summitry form tugotliiT null 
results of tests applied to a set of random digits produced bv the method 

2 Introduction In a note [1] by one ol the amhom n mMhiwI »f |>r<hlm'inK 

random digits was piesented. Tlic method was Imscd upon a profit, di^iipiatcHl 
“compound randomization/ 1 used to produce lamtom binary digits whir li run be 
converted to random digits m othei mimbei Byalema by Himplo mctliodH De¬ 
spite the ease of convening a random binary scuca lo another u, ih of 

interest to examine the problem of direct production of luiidoiil digiln in any 
number system In the course of producing random binary digit 1 * with nmt limn 
tabulating equipment, and while designing an dccliumc dev im lo prodiin* ran¬ 
dom binary digits, it was noted that the multiplication proem ilufurilKul m llu* 
earlier paper was the equivalent of addition modulo 2 of a mm of binary digils. 
This observation laid the basis for generalizing to other number Hjhliwn 1 

3 Initial conditions and notation. Lot iih assume tluil thrm is iwmluhh* a 
source of digits, 0,1, 2, > ■ ■ (n — 1 ), in anumbei system of bn«o o, where n jh a 
positive integer, n > 1 Let p r , rcpiescnt the piulmbihty of obtaining tin 1 rtli 
digit in the sth tnal Assume that initial conditions can bn iunlroUtMi ho that 
the trials are independent 2 and 

(3 1 ) p„ £ t 

where 0 < i ^ 1/a is a fixed positive number (It may be liotwl at thm point 
that conventional "single-stage” methods of producing random numbers arc 
based upon the assumption that p r , & « =» 1 /n) Lot tm rcjircaoiit the proli- 
ability of obtaining the rth digit by addition modulo n of tlio dibits obtained m 
s individual trials In oidor to express in terms of p„ , comitlur two sola of 
matrices whose elements mo defined as follows. 

1 In acting aa referee foi [IJDr GoorgoW Brown euggesiodgojicridining tn oiiitr number 
systems by addition modulo n, 

a J E Walsh [21 has considered, in toims of conditional probuhiUliOH, llic t)f uitar- 
cort elation on compound randomization m the binary Byslem 

82 



nuijuriMi n \nhom dimis 


83 



I 1 Po,i 

JJn \ i P*. -1 t 

Pi • 


Vi. 

?>0 J pn--l n 

Vi . 

(3 2) 

fit " ]h t 

\ 

Vi * Ik 4 

* 1 

ih . 


i! 

i< P't i 

f 

/ Vn -i i p n- J i 1 

Po.n | 


7Tq . 

TTn^t J 1 * 

7r i.« 


ji 4 

7TQ , , , ' 

Ki , 

(3 3) 

<Ti 1 Ifj * 

1 

, 3T|) i 

ira 4 


J 

1 

l| ffn 1 

l '2 • 

TTO l 


Xole tlml <r, awl n t me Mm huff matiures with two additional restrictions (1) 
there menu /rro elements, mid (2) column (as w ell as i o\\) sums arc unity, Each 
ji X ii inuim is made up of only n distinct elements, namely, the n clUTeient 
probabilities u^icnitod uilk the &th lunt for a,, or the n diiTcienl probabilities 
ti^n iafed willi (In' ‘•urn of n Umls fm a, 

4, Relation of av to p,, Assuming independent trials, we luive the following 
lelulimidups 

ai ^ (U , 

as ™ (h rt| ** (h Oi i 

(] |) rtj m m ofj j 0) a a fli i 

r * • • 

i * 

k 

(*k ftk «ji I 73 11 0, 

i«i 

Thus, Min’e any row (oi uiiy column) of an. is n permutation of the , by (4 1) 
the jfft me expressed m leima of iho individual piobabihties, Vrt 

B Convergence of tt,. to I n. (>0l) Thloiiom* = f/n 

Pnom* l^el Pi denote tlio range or the elements of n» bjach element Df 
^ in a weighted mean of the n distinct elements of a..i The n distinct elements 
of cii are unsI tut weigltly in the averaging process. Now tlio lange of a set of 
weighted mewm (weights > 0) or n net of values must bo less than the range of 
the value* ilwmaelves, unites hulli lungea tuc seio Thciofoio, smeo the weights, 
p„ > 0 I tv rendition (3 1), 

(5 f)2J p, ** p, l , fm p. i ^ f)i or m tlio special case /w ■* 0, p t = 0. 

tijA 

AWijHmce }_, x u 1, 

T**0 

"7W|„ u . U«, M «HU..Sp«bh«H<m. J. 'Voiron.U,ntl«pondcBlly proved tboorom 

(SOI) 



84 


h BumcE HonroN and n> ‘’Mini in 


(5 03 ) i/n - p t ^ iTn ^ V« H“ P‘ 

In oidei to show that lrni^ p, = 0, and to denve foimnluo for Lin* rah- of < on- 
veigcnce of 7 r r4 to the limiting value, 1/n, let w. jepi(sent (he imU'K'd p„ for any 
given a u)i ~ the smallest p n , = Lhe largest of iW In it ‘Umihn 

mannei let Xi icpicsont the ordoicd ir r 4 _i the following Jiioi[Uiililies for (lie 
maximum and minimum T r .» can be set down immediately * 

(5 04) nmxir PJ ^ uv^n + ti’n-i £n-i d- * d* wrUi, 

r 

(5 05) min tt „ ^ ^ + uv-i-tt + d- ifi A 


And since p„ - max ir n — mm ir„ , 

r f 

(5,06) p, 5? to^(x n — Jti) -f — aCj) d - ' ’ ' + Wjfal — *b Uli(Ti — x*), 

For n even, let m =* n/2 -|~ lj then by regrouping terms, 

p. £ (Wn - tt»i)(Xn — Xl) + (Wn-1 “ WOCffl-l — t|) d * ■ * 


(5 07) 


d" (Wm ^w-l)(-rn 


Noting that p^i (x n — a:i) ^ (xn^i — Xi) ^ • * - £ (i M — x^i), the following 
substitutions can be made 


(5 08) p t £ (v> n - ffljjp,-! d- (^n^i - Wi)p t -t d- * 1 ■ + (w« - w*~i)pj, i, 
Foi compactness, tins may be written, 

[ n m-L “| 

£ Wt - £ p*_, . 

f-w <-l J 


Similarly for n odd, let m « (n + l)/2j proceeding in the same manner U 4 nbovi*, 
the median teim vanishes, yielding as a final result, 


(5,10) 


p< S 


n »n-L 

£ % - £ w, 

L<”m+1 r“l _ 


P*—I - 


For simplicity denote the expiession in binckots by 5, , then 

( 511 ) p.as ..(m, 


where for n even, 5. represents the sum of the lurgesl ,t/2 of tlie v „ Hit, 

sum of the smallest n/2 of tho p rt , and for n odd 6 t leprCHCiiln llu' nmn of flu* 
Largest (a - l)/2 of the p„ minus the sum of Iho (n - \)/'Z «,f l\u> Vtl , 

Continuing the process developed above, wo find dml 


(512) 

(513) 


Pi ^ 5i <5i—i < p ( _j | 

• i 


Pi S 5. <5.-1 * 6 fc _ a r 


* Pl . 



PIIODUCINCJ 11ANDOM DIGITS 


S5 


Since 61 £ pi, the following simple inequality holds 

v' 1 n) m S II5., 

•-1 

Now S t £ 1 - ne, by condition (3 1 ) and the definition of S 4 Thciefoie, 

k 

(n 15) lim m < Inn II 5 t < hm (1 - nt)* = 0, 

I-'w k-> kj i — l k-« oj 

mul (yi 0 L) 14 proven In the special case of constant probabilities fiom trial to 
tt ml, <Sj = iSq , ix constant,und (5 U) becomes 

(5 Lb) Pk £ [h 0 f 

Since Hie mean w tt is l/n, wc have the following useful inequalities' 

K k 

(o 17) l/n H 5, ^ tt r jt ^ l/n + II , 

i-i i-i 

in (ho ease of vmymg prolmbililies, and 

(5 18) l/n — (6a)ir r * ^ 1/ti + (5 d)\ 

in the case of conaLnnt probabilities If fi, is not known m each trial, an uppei 
hound, flj,, may be estimated on the basis of knowledge (including statistical 
tests) of the digit gonci'nling pioccsa Then the fallowing inequality will hold 

(l> 10) 1 fll — (Si) 11 =5 TTrfc l/n + (6fr)*, 

uheio 8b & (J — tie) 

It m woithy of noto that inequalities (5 li) and (5 15) become equalities if n = 
2 (bimuy (system), thus, 

(5 Mb) P k = II 6< = II | p. - q. | = II 1 2y t - 11, 

(5 15b) pi ■= (fio)'' = | V ~ ? f = 1 2p - 1 |\ 

These lesuRs were obtained by different methods in [1] 

6 , Discussion of results. Cei tain facts nio implicit in the fotcgomg analysis, 
but me worthy of mention in passing, The compounding piocess may consist 
of addition modulo n of digiLs taken fiom anumbei of digit-pioduemg machines 
If any machine, h, is polled, i c., p r * 3=1 1/tifor all r, each element of the piobabil- 
ily rnatii\ (th wjll bo equal to 1 /n, and p/ t =* 0 Consequently, each clement of 
a,, s £ A, will bo equal to 1 /a by (5 17) and the special case of (5 02) Thus 
any combination which contains a perfect machine is perfect, This is equivalent 
to a lcslntemcnl of Von Miscs* [3] icquucmcut that the sum of a landom set 
and any oilier set must itself be a landom set Fuithermoic, by (5 02) the re¬ 
sults taken fiom any machine, no matter how neaily peifect, can be improved 



80 


H DUUKB HORTON -AND It TYNLS HMIfll ill 


by combining with the lesults of another machine, no innltrr him liiuwl the 
lBttei may be In the limiting case, p„ = 1 (oi 0), tin' iirolmbllilnw of the vun- 
on 3 digits ate merely interchanged 


7, Production of random numbers with fixed but unequal probabilities, Tim 
principles presented above can be adapted to the production of random munluTs 
with unequal probabilities as follows! Assume that a set of random digits 0, 1, 
2, ( n - 1), ia lequhed in a number system of base n, with piulmbililn h q a , 

51 , 5 a, 5„-i, Z"-{ Qi = 1, wlieie each q t is a proper rational frat hon whu)« 

may be written as the quotient of two positive integers, q t ® “\ ClioufH) tn 

11 

as the basis of a new Dumber system, wliero m is the least common multiple of 
the vi, 


( 71 ) 



mu. An 
m 


A set of random digits, 0,1, 2, (m - 1), in a number ayatmi of I him* hi may 
be generated by the process described above, oi a eel of Hindi digits may In* i on 
structed by entering an existing table of random digitn, Imw n, rind iiitcrprHmg 

appropriate numerical quantities, base n, ns digit symbols, I moo hi humr HfM ' 

13 an integer, groups o£ digits, miio, mm , ’ mu n _t, in Iho m system innv V»a 
coded as digits, 0,1, 2, ■ (n — 1), in the n system An upper bound for iho 

maximum bias of qi will bo —'■ pk , where p* is the range of m the m syalom 

Thus, by increasing k, the bias of qi can bo made smaller than any prwmgiml 
quantity 


6 Convergence under more general conditions, Convergence of lo 1 /ji 
occurs under a variety of conditions less restrictive than (3 I) 

(81) Theorem In the case of independent trials t a necessary and Huffictcpl 

condition tkat\iTav ri = l/n is that t, where t is a fixed postiivt numlnr, 

arbitrarily small, and i is a fixed positive integer, arbitrarily large, It is obvious 
that (8,1) is a necessary condition for conveigence, Tn prove that it in u miflh 
cient condition, consider the following', 

(8 2) Lemma. If £ Vi where tj is a fixed positive number , arbitrarily small, 
then lim ir„ - l/n 

l-tos ! 

Proof Take a fixed intege, , M fc ft - 1, Now any tligil, r.cmi l,e oIiUmumI 
m at least one way, i e, t as the sum of r ones and (h - r) zeros. Thorefoio, 

^ *>& ^ r, where r =■ rf 



llANHOM Dldlrfe 


87 


c now i cgurd h Inula as a mnglv trial of a complex mnohmB Let u represent 
the ntimbn of httcli imnplox \rails Let ttJ u rcpicsent the probability of ob- 
turning flie rl)i digit us flu* result of addition modulo a of u complex trials Then, 

(8 ‘ 1 ) hill Trlu - 4 Inn JTr.CuH 1=1 1 /w, 

K“'"S K-IM 

|,y (C) H1) Now j* -» I- j, (I £ j < h, 0 nti integer), or uk ^ uk + J < 

(it -1 l)h llio J simple Irwin cannot inrrenso the maximum bias, by (5 02)' 

rnimiurutly, 

fj) Inn (kA | ,) lim !Tr (111 +1) l/?l 

H <«n (U*l (• |)"'W 

ftmec there is u mu'-lri one uirrrhpondrnee between tlio dements of [s) and 

I uh -I ; 11 

(8 [',) hm ir„ **» 1 }n 

Hy n miturnl o\L 1 iih|oii of flu* lemma, we may regard t trials as a single Gonr 
ple\ trial i'luomn (K l \ Hum \he foiro of (8.2). 

0. Numerical rcRulltf In various number systems, Moreefficient convergence 
formulae *»m Wo dmwxl in meet apeml umdiliunB. Thoeo presented in (5) 
Jiu\i* flm iiduint.iRi k of Himplmti ami generality To test the efficiency of 
(fj \M s* u rid miiueru id oMimples, haml upon unusual hypothetical piobabilitics, 
acre worked \*y irudn\ nmlti|ilnidmu mr in (*11) In these problems p r , = p r > 
a i'uiiHlnnl, from fritil l<» tri-d A lobular tnmpiiiJBou of the ranges, computed 
hv (‘1 11, and llm upjn r bound*, dt*li*riiuiu*d by (6.15), is picaenlcd m Table 1 

for ^ f in 


10. Preparation ami tcnlts of a set of random digits. Since an unlimited num- 
her of valid Ms for nuirlunifiMM nmy bn devised, it jb obvious that any finite 
stil of digil* mmnd mtvl dl wwli U*U An a mutter of fact a Uuly random proo¬ 
fs should yield wdn winch /ml to meet winw proportion of the tests, the fraction 
being dcl<*rmuit*l by (lm level of Mgmfii'iuuo adopter! m testing No finite set 
of digits vm lie i mwdtW r.uidom, llm testa for randomness are really applied 
U> ilftmniMi* Dm < ImrnHer nf the gi'iimUm* process. Ilowovor, the concept of 
"lucidly random" mha ok devcUipe-l by Keiulall and Smith l-l] 18 useful, and sorae 
of tiit’ii t, Mmin* imd Mow « M rvidem r* that a net of numboie produced by com¬ 
pound raiidomiaufroji is hk'dv in la* locally nuuUun. 

A mm random of wm dm mini digila Having Urn relative frequeucios 
indimiHl m tlio Plimol Inir of Tnhlr l won punched m cards and tabulated. 
TuUIh were uxhu fnruuli lm ‘uuImuuI ihnimininun Uiounitflposition o he 
rnuntir «*. .UL hi » imm*» nml. Ihwlqr a nt of 1-.000 <ligris 

Thu fnnuini m* -f ilwo »< (•'<’ O. rHrO wt w Mmpmd wBi » * 8™' 
crulmg «rt m Lddv 2 TIip fuM^irwai ut tho derived set mo in accoid with the 

hypolliwuH of eqiwt) probahilUiw 



TAHITI 1 


Companion oj xmpM row mdimmfofo r ««>« " 1 " 

Hypothetical numerical elamplct, ctmtlnnl praliiiMihf /m"‘ Inal In Ifml 


Hum 




Probqbilllj' In nn Individual Irml 


base 

jto l 

pi 

Pi 

P* 

Pi 

pi 

pt 

pi 

pi 

pi 

2 

800 

200 

— 



— 

— 

— 


— 

3 

500 

300 

200 

—• 

— 

— 


— 

— 

— 

3 

070 

020 

010 


— 

— 

—- 

— 

'—■ 


3 

400 

300 

300 

— 

— 

— 

— 

1 



4 

200 

100 

400 

300 

— 

— 

— 

— 

— 

■- 

5 

050 

200 

400 

020 

330 

_ 

— 


— 

- 

6 

080 

240 

300 

020 

200 

100 

— 


— 

— 

7 

300 

020 

240 

050 

130 

170 

090 

— 


— 

8 

200 

050 

000 

180 

100 

090 

150 

no 


— 

9 

030 

,080 

150 

000 

140 

000 

100 

050 

,210 

—■ 

10 

050 

150 

200 

050 

050 

120 

OSD 

020 

ISO 

,100 

10 

010 

020 

030 

010 

050 

000 

070 

080 

0D0 

550 

1G 

110 

HO 

uo 

110 

UO 

110 

110 

110 

UO 

ou 

10 

150 

150 

150 

15Q 

150 

050 

050 

060 

050 

050 

10 + 

014 

171 

164 

181 

023 

095 

017 

205 

089 

008 

12 

010 

070 

120 

100 

050 

020 

090 

010 

080 

no 


^ » 


fA.j 


Pi Pa 


— | — 1(701 il> I NKJ 


- I — j iHUNirri'XIlM ,HM 

— i - ' iiOmruHn' i*ni<* iji. ti«a 1 on 

1 

(HHKHIVVOS iNMHHIHJiiil I ](t(| 


(NMj7s 7^177 Ol r M»Mii ( v\ H«<| 

J 

(hkkm^I7j MOft)iiv»i7n' wxi 

I 

CKHJ1 77 ^ 11 ' rnij ( \L»-i;,in i Mq 
(MHMINHKWij IHMKijr^’l ' JWl 
(KHWUiWiJs 1 IK*r,J7M||H 17(| 

t 

iLNji;, r t jn 7<«i 

1KHKH KHW km <»lnhK«0<w*U 11 Ml 

; IKuniilWJ’i 7*j 

iKMwrioiMii omT.vi'iiu mn 


000 MW (HXXHHLWifl IHwr»fV)Wi ,VHJ 


*Thm badly biased sot of piobabilitioa wna uacd to pi ntluci 1 llu* hK of rniHlmn drviiiitl 
digits tested id the next sootioo 


TABUS 2 


Digit 

0 

l 

2 

a 

4 

Generating set 

014 

171 

104 

L 84 

023 

Derived set , 

088 

112 

080 

1O0 

.110; 


H 


0 


Frequency test (clenved set) x 2 = 7 0 

TABLE 3 


18'1 023[ (RV (117 USU OOK 


e.;i 


?th digit 


0 

1 

2 

3 

4 

5 
G 

7 

8 


(i + l)lll lilftlL 


11 

10 

11 

9 

G 

9 

a 

13 

7 


8 

13 
10 
10 
12 
17 

14 
10 

8 


7 

1C 

7 
3 

10 

11 

9 

9 

8 


3 

1 

fi 

11 

ft 

‘ 

‘ 8 

l *1 

7 

5 

7 

12 

12 

11 

K 

9 

11 

J 1 

11 

H 

in 

1 11 

10 

10 

7 

0 

11 

1 7 

1 il 

H 

12 

17 

0 

8 

■ u 

f 12 

10 

19 

(i 

10 

U 

, ™ 

! 7 

14 

10 

0 

5 

IT; 

t 

(1 

: 0 

0 

14 

10 

15 

8 

: <’ 

! 10 

9 

8 

11 

7 

12 

7 

i 12 

12 

9 

11 

’LL 

M 

1 m 




1'iiuui m« iian«o\i uifjna 


89 


In the ‘•eintl h'sl adjacent pairs uf digits me tabulated The distubution of 
lhe*c jviiM m On* deiivrd set appeals in Table 3 This teat indicates that ad¬ 
jacent digil^ air lmlrponiInnt 


TABLE <1 
Gap test 


i 



Length of gap 


' 


uitfit , 


0-1 j 

2 l 

5-7 

8 nii(l 

OVQI 


P 

I 



Frequencies 




0 1 
i 

( )Ipm*i\ ed 

Kxpeeled 

10 

10.53 

18 

19.10 

n 1 

13.92 

42 

37 45 

1 25 

75 

I 1 

Dimmed 

I'Apeeted 

27 

21 09 

27 

2*1,37 

21 

17,70 

3G 

47 78 

5 44 

15 

2 | 

j 

< tbhrrved 

Kxpri fed 

10 

10 10 

17 

18 GO 

10 

13 GO 

42 

30 50 

1 90 

,60 

i 

3 

i 

Dimmed 

1 I’Apicieil 

1 

19 

1 10 70 

20 

22,83 

18 

16.04 

41 

44 77 

90 

92 

* ! 

! 

l 

i (Ibsmed 
j lwpccled 

i 

31 

21 28 

17 

21,09 

20 

17 92 

44 

48 21 

7,39 

,06 

r, 

f 

j Dimmed 
, bxpci led 

15 | 

10 10 

21 

22.17 

15 

I 10 ZG 

50 

43 48 

2 04 

.57 

0 

' dimmed 
l , '\pr , eli 1 d 

5 

• Dimmed 
| I'Apei'thl 

27 

11) 00 

25 

21 95 

12 

lb 00 

30 

43.05 

5 95 

12 

7 

20 

18 13 

19 

21 29 

10 

15 52 

42 

41 70 

40 

.93 

H 

i 

< 

| \ llmi j \rd 
j K\jut led 

n 

18 21 

19 

2 ) 07 

21 

15,30 

42 

41 32 

3 27 

35 

l) 

ObseiM'd 

Kxpreled 

18 

18,43 

18 

21 29 

21 

15 52 

40 

1 41 76 

2 53 

,48 


The Kan Hat m bused Upon the distribution of lengths of mtavnls between 
Kuril ihffitH \ nmipuiiMin of the mimbei of gaps of specified lengths and the 
expected number in each ease is picsenlod m Table 1 The lesults of this test 



90 


II burke HORTON AND R. HUH ^MlU1 III 


are also m accord with the assumption of local randomim^ Nutmg the badly 
biased probabilities of the initial set of digits, the Mills uf llw'H* lesh demon- 
strate the effectiveness of the compound randomization pi mum 
T he use of tabulating equipment foi pioducing landom decimal digits by addi¬ 
tion modulo 10 is relatively fast and simple The author ha\ e jiK ruinplrttid 
production of a set of 105,000 digits in leas than two days' tabulating lnnr, 
75,000 caids, repiGsenting appioxnnately 3 months' m n)Ah of a i nrrent i nrlond 
waybill study, weio used to generate the digits, 11 non-con elated ndumus h< mg 
added simultaneously, A clmm of length 10 \m until, although iltc nature of 
the initial data was such that a shorter length would probably have go, mi aulia- 
factory lesults The derived set is now recorded on 1500 curds, 70 digits p^ 
card Preliminary tests for local rondomne^ contain the random nature of ihe 
geneiating process Upon complotion of the tests LIiib will be mprodui ed m 
tabular form, 



[I| II B IIqrton, "A method for ohLauiiug random uuiuhem" a/ lf<i(A *Sto( V «| 
19 (1948), pp 81-85 

[2| J E Walsh, "Concorning compound iiaulomirnlion in llio lunar) Bjiioni/ 1 u,i| UJ i, 

Jished muuuonpL, Project MND t Don pAn drrcra/I ('n, Shut Mnnid, tbihfur 
ma 1 


[3| B vonMibes, Promty, Siatutm mi 1'nih, Tim Miirmillmi Cu, .\w Vork, I'l.H 

I |M G ICbhdali, and B 13 Smith, "EiimlomncBi nml minium unmplliii; iiumlirn, " 
Hoy Slit Soc Pour , Vol 101 (1038), pp M7-1M 

(6| M G KendallakuB ii Smith, "Socmid piper mi hmdum mimplmp wmltm," Simp 
to Roy Ski Soc Jour , Vol 0 (U)30), pp fiHJl P 

181 G U . . % A/cil tor Jmr , Veil 

iui (lUJfl), pp 107-172 

PIC W VicmY,"OndrawmgnI’mulomsample!rom a itt«{pumlicJw,U,"Snpp m 
Roy Slat £oc Jour, Vol 0 (1030), pp 02-00, H 



ON A MATCHING PROBLEM ARISING IN GENETICS 

By Howard IjIivuni: 

Columbia University 

L Summary, V ildlwlh useful for detecting deviations from the Ilauly- 
Womheig ecpulibimm in population genetics is dibPUBwd Both exact and 
uHymplnliL rlntiihulLittw jlic Riven and a special ease wlicic thoie is misclfiasihca- 
Lum is dihcunsod Tin* diulnlmlnjii obtained also anacs fiom a ecilain caid 
mulching problem 


2, Introduction. A system of multiple alleles behaves as follows unclei 
Mcndclum inlimlamo Theie me r dihlincl fonns 01 alleles, , ,a r , of a 
Riven gene A given individual contains two genes and can be lepicscnlcd as 
(i t /a, If t *=■' j llie individual is culled u homozygolo, if i ^ j it is called a 
helermsyRole The representation a,/a, is culled tlie genotype In icpioduetion 
ctuli gnmele pindueed by an a,fa t individual contains one gene winch has a 
probability 1 J2 of being a» and 1/2 of being a, In fertilisation a paLouml and a 
mnlrinnl gamete fuse to foim a new individual winch contains two genes, giving 
the well-known Mcndclum ratios AVe now consider a huge random breeding 
population of N individuals This will contain 2 A r gcuca > of which the propor¬ 
tion r/, will be of lype a t (r => 1, , r, = 1) Tlic piobability that a 

mmlom individual fiom the next gcnnulion will be a,/a, t =j)ov2q,q,(% ^j), 

winch me known as the llui dy-Wcinbcig cquihbnmn piobabilitica The 
HUliatiml problem muse in testing (by means of a sample of n individuals) the 
hypolhcHiH that linn Ilm dy-Wemberg ratio liolds against the alternative hypothe¬ 
sis that diHlmbmg foiecs decrease the number of homozygotos, The actual 
data baa been discussed elsewhere (1] 


3, The sample distribution of number of homozygotes, We shall assume 
tluougliout Lina paper that N is so laigo that random fluctuations in the pop¬ 
ulation proportions from generation to genetation can bo ignored Let 
x,,(i < j “= 1, , r) bo the number of a, /a { individuals in the sample, and let 

j j( « xa + bo the numboi of a, genes in the sample, We have £2au, = n 
and ^ 2u, Lot h ^ be tlie mmiber of homozygotca, and z = n — h 
lie Die number of hotelofcygolea in Iho sample The probability of the observed 
wimple ia 





“ Tt i i n fa! ) XI ( 2 Mi) 

11 £./' i-i .</ 


n[2^_ -p- Vi 
hit 


01 



92 


JIOWAItD LLVLNIJ 


Since the aie unknown we use the comljlumnl |uob ilnlitv when ?/j , , y f 

aie held constant Whenever we use the \v<ud 1 cuiuULlmiiuI” hercafLei, thm 
condition will be undoi stood The conditional prohubihU 14 


( 2 ) 


K 1 


nl2‘ 




wheie 


1 _ y, u! . 2 L 

l<! ^ II aV' 

1 ^/ 


where the summation S'ja over all non-nogative iiite^ral values of llie sub¬ 
ject to the condition 

'Cm Hr £3*0 = l/i (* =» 1 » ■ 1 , r) 

Consider 


(3) 

(4) 


(2 ^i') = (S + 2 J2 i<itY 

\ 1 / '<,i 


S« 

*3 1 


*i 4 I £t< j 


where the summation X* is ovei all non-negative values of ilu* T if nubjovl 
to the condition , = ??, Evidently l/IC is lliti cuoffieiunt, of JK* 1 \u (j) t 
but this must equal the coefficient of thiB term in the left member of (.1), mid 

thus 1 /IV = (2n) )/Uy ] 1 , Hence the conditional probability of tin* ob^*ivn] 
sample is ■ 

(51 p = XL} , _ 

() (2»)l XI 

1 5 / 

For any function u(jh, ■ , asjr, •* , aVr) wo will now lot ]C(u) ami u s (u) 
denote the conditional mean and variance of u for fixed Vi , und will refer to thorn 
simpiy as the mean and vaiiance Wo fust obtain the utb factorial momonl of 
r it , that is E(il ), where a, = xix ~ 1) - (a - & + 1 ), Consick^r 

where — x }k except that v,, = a 1 ,, - s, mnl 3' has llm wimp mumniignH m (2) 
The light member of (6) u evaluated ouudly K " WK 


(7) 

From this 


m?) 




expression we obtain 



V Ml'milNd VllOllLlM 


93 


and 


r<i) a \ t ) „ "'V' . _ f ny™ n 


(2ft) 111 (2 n) f ’> L(2V) (n >J 

w\mof t - y,/2n is the sample rstimiile of r/, Similarly 

( 10 ) 


= «/*(! — / ( ) a + 0(1), 




p,r\ mg 

(ID 


a(r r ) = 

U,u " } (2n) (U 




4(2n - 1 y 

Ollun moments ran lie similarly evaluated, ui parliculiu Ufa,,) = Vi3/;/(2n 


-1) 


4 Asymptotic distribution of number of homozygotes. Fiom (8), (0), and 

(11) no mav easily oblmn 

(12) - ltf(u) = (f/ - 2it)/(ln - 2), 

(13) **00 - ^ s (tn) -h 22.r «rCi>.. x„) 

kj 

(11) - J^r(M- 2 ) + ^(%+- 5 )-/;(' l -± 2 )}-i + oQ, 

nheie f » and I) = Si/! The fmmulu (11) is a close uppio\imnLion lo 

(13) and is easily computed Furni (5) by ineuns similiu to those classically 
iimsI In pi me asymptotic nmmaliLy of the binomial distribution we can piove 
asymplolu noimuhty of the conditional distribution of A, moro picciscly, if 
n - > <* ami i/i/n -> cmisUmL (t ■= 1, ,r), then 



6, Effect of misclassification Theic is a fui Lhui complication in the particular 
cum 1 lopoitcd m [1], All individuals of genotype a,/a t aio coireetly classified, 
but an individual of genotype ai/a, (i ^ j) has a known probability p/2 of being 
rhisHihod th/flt and an equal piobability of being classified a,/a, As a lesult, 
tin' absolved piopmlion of ht)nw.ygotes la a biased estimate of the pvopoition m 
the population hot A, x t> , yt denote tho trim sample values, and lot A', u(,, y[ 
dcnole the rounded sample values Thru A* = h‘ - c, whom o = (» - h') 
P/(\ ~ p)t will give an unbiased estimate, i c, A'(A*) = ${h) In Older to use h* 
wo must have its (conditional) variance ttmeo h* = np/{ 1 ~ v) + A'/O — p), 

#1* = [1/(1 - v )\ 2ff v 

Let k — A' = e, then foi huge fixed (n - A), g is approximately normally dis¬ 
tributed wiLh mean (n — h)p and variance 



94 


HOW \lin MVMi'L 


(n - k)p(i - p) ■= In - J?(%(i - All 1 I- r>AVnl 

Neglecting the icraainilci twin in tins vinumee, < uwl h luie a jnml inn and 
distribution with paiameteis tlmfc me easily rahililcil MV thus have 

(,j, = f 2ff(Jt, f), or «'■ = [n - - p) I- (1 - P?«i, 

giving 

(16) <rj- = 4 + [it - A'(/t)|p/(l - pi 

In (1)»{. was given as n t ! + c foi tlio sake of simplicity, Tins nniilil tnul In he 
smallei than (10), hut only negligibly so Stiicily Hjieaking the nilciihUnm nf 
E(h) and <rj fiom (12) and (U) icqmrcs ft know ledge of the line i/,, Iml tlio 
obsotved y\ aie unbiased estimates of the y, and their into slmtilil chums no 
S&rious ti cubic 

6, Combinatorial statement of the problem. TIipj )>ioI)Iimii m\ uho bo 
c\piessed as one of caul matching i\n follows A deck (outunis in emth of r 
different .suits, with j/i cards of the ith suit (i - 1, , r) We draw n pwra of 

cards at random without icptament, exhausting the iled Vilml in tin* 
distubution of h, the uuitibei of twmn (pm m wluvli Ixilli miMnbiTh m of (ho 
same suit) lh = n- A, the probability of exactly h l\\ mu uj given by (1), ami 
in the limit h is normally distubuLed with moan given by (12) and vurnimo 
given by (14) The catd matching problem docs nut mwilvo Urn mdimi of 
conditional piobobility By inti educing Yunubb u* equal to mir if the alii 
pan is a twin and zero othwwiBc, the moments of h m\ i\\m bo obluineil w ab¬ 
out using gencmtmg t unctions. 


llETMNCJi 

[1| Theodosius Douziiansky and Howard Uvbnk, "Ucnotit'a of nnUini] pujiuMmiw 
JCVir Proof of operation of natural aclucUon m wiki pupuUatuiui uf flmspjjAito 
pscudoobma’ 1 Gmtm, Vo] 33 (1018), pp 637-317 



A MULTIPLE DECISION PROCEDURE FOR CERTAIN PROBLEMS IN 

THE ANALYSIS OF VARIANCE 

lh Hun \ur> V.w j.sia 
f /m rrqiftf tif )Ynfihin{}ltm 

X Introduction, In Mii^ paper wc will dmeu^s u certain Lypo of pioblcm 
ulurli *in*M •» in mini applealums of tin* nnnlws of vnimmc Wc suppose 
thul w<* mi* rumi K virn'iHs ami un* mpmcd to investigate the diffeicnces 
among iImh on tin* leisi* nf the olvwived Yields fiom a Riven experimental 
design, “in li n 1 * n Jtf‘1 of randomis'd him ks or u latm miiuuc The classical 
procedure Jl] for de ding with Mils problem bus been to test the null hypothecs 
Unit l)ii' A" \ nrieln** are nil equal by computing the rulm of the mean sum of 
Hpinn^ Muhu \«ru*Ui‘H lo the rcsidiifri mem sum of squiues, and lojeelmg 
Un* null In polio *e< whenever this rmlui cm ceded the mlwul value corresponding 
to tin* levil of sigmluwc n*cd However, tlm shuuliinl discussions of this 
procedure hthi to be quite i ague oil the question of what action should bo Liken 
ufler Hie mill h\ polio *is h e* been rejected 

In u number of probh in*’, the prm lienl uluuhnn ,seems to be Mich that instead 
of testing ilie null hvpoihvMS Moil (lie viiuchcs do not differ, what n leally 
require! h n Nfat min'd rule or “dechion function" uInch on the basis of the 
observed vields mil tlnuifv I hr 1 K varieties into a "superior” group and on 
'‘inferior"' group If flu* euprrior group wiiihihLh of more than one vuuety, 
the next appropriate mtnm mil of emir/H 4 depend on the particular problem nt 
liun.il tu uuitu* “iluitlioiK the varieties m the superior group might then bo 
Hubjorl to further MdcuUnn oil the hiwiii of noma secondary characteristic, or 
additional observations might lie Liken Li rliwnininulP between the members 
of the wipennr group, after discarding the varieties in the inferior group How¬ 
ever, if nil v jirieLioH happen to bo clarified m one group, the group will bo 
labelled "wulrid" and this remit m to be interpreted as implying that the 
varieties ure homogeneous 

In this formulation, the problem is now of a multiple decision typo, it is 
nmwaury to decide un the bums of u wimple; which one out of the 2* - 1 possible 
deeiwoim for eln^ilieatnuiH) Li select We will suggest a solution which seems 
quite reasonable «ii rut intuitive btmnt, but it is still an open question whether 
IIiih solution m a,ii optimum one 

2, A special cane, In thin ueetion we will diHCiiss Iho problem under the 
owuunpliou thru the \ nrianre v of n single observation is known a prion This 
is u rather restrictive assumption, but it can ha considered us appioximalcly 
Bulisfunl when the number of degrees of freedom available foi estimating the 
variance i« huge, winch will often be the tw. The minor modifications neces¬ 
sary to secure exact results for Ilia small sanijilo case when a- is unknown are 

05 



96 


EDWARD PAULSON 


discussed m section 3 We also assume that the experimental design has been so 
selected that there will be the same number (r) of observations on each of the IC 
varieties 

Now let x ia = the ath observation on the z'th variety (i = 1, 2, • • , K ; a = 
1, 2, ■ ■ , r), let x, = 2 a li z ia jr, put m l = E(v,) where E stands for exported 
value, and take X to be a given positive constant. The conventional assumption 
is made that all the observations are normally and independently distributed 
with the same variance a 2 . Denote by x M the maximum of the K mean values 
xi, x 2 , ■ • , x K . The rule for dividing the varieties into superior and inferior 
groups is the following: the superior group is to consist of all varieties whoso cor¬ 
responding mean values fall in the interval [5„ — \a/-\/r, Xm] and the remaining 
varieties constitute the inferior group. (As mentioned earlier, if all the varieties 
fall into one group, this group is labelled ‘neutral’ and the varieties are considered 
homogeneous.) 

This rule completely determines the classification as soon as X is determined. 
For a given sample size, we might select X by considering the relative, importance 
of different types of incorrect classifications. If II denotes the emu of mis- 
classifyuig the varieties when in fact they are all equal, and G denotes the error of 
misclassifying the varieties when they actually are unequal, then it is obvious 
that the greater the value of X, the smaller the probability of an error of type II, 
but the greater the probability of an error of type G. Therefore for a given 
value of r it is necessary to adopt some sort of compromise in selecting X. 

For a given value of X we will now derive explicit formulas for /’(//), the 
probability of not classifying all the varieties in one group when -tin ---•■■ « 
m K , and for P(Gy the probability that as a result of the experiment there will 
not be a superior group consisting only of the Nth variety when tth - m 2 «...«= 
wik-i = m and m K ~ m + A(A > 0). Gi was selected because it appeared to be 
the particular kind of type G error most likely to be useful in applications. 
Also P(Gi) may be regarded as the least upper bound of the probability of 
misclassifying the varieties when one variety is superior to any of the others 
by an amount at least equal to A. Now if we denote by W = (.f „ —' .r mill ) 
the difference between the maximum and minimum values of the. set i’V.’i 
( z - 1, 2, ■ ■ ,K), then it is obvious that 


( 2 . 1 ) 


1 - P(H) = P jjjr < 


Imnl P Sh nV 1 ? lld T*' °!, (2 ‘ 1) 13 equivalent to Uie Probability that the range of a 
I 7 ! m J pendent observations from a normal distribution with uuit 

and Slrtlev S V ^ ^ has all ' eac, y been tabulated by Pearson 

y ^ rom ^bese tables it is a routine matter to find PUI) cor¬ 
responding to a given value of x, and conversely. To evaluate P(rV,) t we have 


1 ~ P(Gi) = p\x t < x K - 


f°r each i (z = 1, 2, - - - , jfC — 1)J. 




Mriyrirt,]: nncisiox 


97 


Hy evaluating the piobabihty of this event for a fixed value of x K and then 
integrating "Uf with import to i K , it is a simple matter to verify that 


(2.2) PUu) 


^ / <!e 2) . 1 f 

/2tt J Lv / 2ir L 


■> y \ i&ltrjy/r—\ 


c- u ' iv dt 


dy. 


In some applications, it may he desirable to have an explie.it expression for the 
probability tlmt the superior group will consist of the A'th variety and not more 
than r inferior varieties when »n -= m< -- • ■ • = m, K , = m and m K = m + A. 
If we denote this probability by 1 - P* it is not difficult to show that 

1 — P, — 23 ( ^ [T\ a -f- aTiol, where 

a~(l \ a / 


r* ,, ’’Pi-' r 1 ri 
[ X- rr"' m dt 

J v 2 IT 


V 2n 


^ /* y J 


JC—ct~l 


(2,:i) 


1 rlHIlW'/r 


r l r 

_~\Z2ir Jy 


H.l/<n\/r—X 


e- (l!,a > dt 


dy, and 


i 


v 


1 r* , . r 1 |*V X “|K-a-l 

. [ r^* 1 c dt 

2rr i vi I V 2?r J~*> 


[vU... 


. 1 ™ 1 r 1 fi-OWl/r 

" - u -i- f . dt dy. 

J Lv^TT Jv~‘{&/tr)y/r—\ J 


3. General case. We now briefly discuss the exael treatment of the pioblem 
when <r is unknown. The notation of seetion 2 will he, used, but in addition 
denote by .s a an estimate of a" resulting from the given experimental design 
which is based on the residual sum of squares with n degrees of freedom. It is 
well known that «* is independent of the set (j,| (f = 1, 2, • K). Now the 
rule* to lie used in classifying the varieties into two groups is us follows: the 
superior group is to consist of all those varieties whose mean values fall in the 
interval [.?« -- X.y \/r, .f\v], and the inferior group consists of the remaining 
varieties. 

We now find that: 


Chi) 


1 - /'(//) = P[W < Xs/Vrl- 


The right hand side of (3.1) depends only on the distribution of the 'studentized’ 
range and has also been tabulated by Pearson and Hartley [3] although the 
tabulation is considerably less complete than that of the. range in [2]. It is also 
easy to verify that the expression for P(G\) now becomes 


Pin,) « i 


(3.2) 


.n/2 /*« 

I Dl'ij. I Jo J~so 


■\/2r 2‘" 






_vfef. 


U 2 /2) 


dt 


K -1 


dy dw 


with a similar modification for P* . 



98 


EDWARD PAULSON 


4. Remarks. Any application of the ideas suggested here would he greatly 
facilitated if tables of P(Oi) were made available. If this were done, it would he 
possible to decide in advance of an experiment how large r should lie in order 
to have a fixed control over both types H and G\ errors. It is obvious that 
further research both along theoretical and applied lines is needed. In conclu¬ 
sion, the writer would like to thank Professor Albert Bowker for several helpful 
suggestions. 


REFERENCES 

[1] R. A Fisher, Statistical Methods for Research Workers, Chapters 7, 8. 

[2] E. S Pearson and H. 0 Hartley, ''Tables of the probability integral of the range in 

samples from a normal population," Biomelrika, Vol 32 (19-11—12), pp 301-310. 

[3] E. S Pearson and H. 0 Hartley, “Tables of the probability integral of the aludont- 

ized range,” Biometnka, Vol. 33 (1943), pp. 89-99. 



a Modified extreme value problem 

Ih Rkxj\mix Epstein 1 

Con! Itmnrrh Laboratory, Carnegie Institute of Technology 

1. Introduction and summary. Consider the* following problem. 

Particles are distributed over unit areas m such a way that the number of 

particles to he found in such areas is a random variable following the law of 
Puiwm, with r equal to the expected number of particles per unit area. Further¬ 
more. the particles themselves me assumed to vary in magnitude according 
to a size distribution specified (independently of the particular unit area chosen) 
by a d.f. FO-t defined over some interval a < x < b, with F(a) - 0 and 
I'(b) ~ 1. The problem is to find the distribution of the smallest, largest, or 
more generally (he nth smallest or nth largest particle in randomly chosen 
unit areas. 

The problem as stated is not completely specified. To specify the distribution 
of smallest or largest particles in a unit area one must give a rule for dealing with 
those at can which contain no particles at all. More geneially, in the ease of the 
distribution of the nth smallest or nth largest particle, one must give a rule for 
dealing with those areas which contain (n — I) or fewer particles There are at 
least two possible alternatives. One alternative is to omit none of the areas 
from consideration by setting up the following rule: if no particles are found in a 
given unit area then this area will be considered as one for which the smallest size 
particle is x ® It and for which the largest size particle is x = a. More generally, 
if (u — 1) or fewer particles are found in a given unit area then this area will be 
considered as one for which the nth smallest size particle is x = b and for which 
the nth largest size particle is x ~ a. A second alternative is to restrict attention 
to those areas which contain at least one particle (in the case of the distribution 
of smallest or largest values) or at least n particles (in the case of the distribution 
of the nth smallest or nth largest particle). In other words, this means finding 
the relevant conditional distribution. 

From the point of view of the application of the theory of extreme values to 
fracture problems, there are some situations where the first model and other 
situations where the second model is the more appropriate in describing the 
phenomenon under investigation. In this paper section 2 will be devoted to a 
derivation of the distributions associated with the first alternative; in Bection 3 
the conditional distributions will be described briefly. 

2, The distributions under the first alternative. In this section we shall 
be concerned with the first alternative. To find the distribution of the nth 
smallest particle, in unit areas, we first observe (the verification is left to the 


1 Present address, Department of Mathematics, Wayne University, Detroit, Michigan. 



100 


BENJAMIN EPSTEIN 


reader) that under the hypotheses of section 1, the numbei of particles having 
size <2 in a unit area is distributed according to the law of Poisson, with 
expected number equal to vF(x). Next we note that the probability that the 
nth smallest particle in a unit area exceeds x in size is equal to the probability of 
finding exactly 0, or exactly 1, oi exactly 2, • • , or exactly (« - 1) particles of 
size <x in that area Therefore G n (x), the probability that the nth smallest size 
particle in a unit area is < x, is given by 


( 1 ) 


G n (x) = 1 


X 

i-0 


-,»U) ( vF(x))‘ 

c 71 


x < b; 


= 1, x > b, 

where we have assigned to the size x = b the probability c~ u (v'/j) ! which is 

just equal to the probability of finding fewer than n particles in a unit area. 

If the d.f, F(x) has a derivative/(x) for all % lying in a < x < b, then (/ n (.r) 
has a derivative for any value of x b. Therefore the probability density for 
the nth smallest size particle is, for any x b, given by the function ij n (x) where 

( 2 ) «.<*> " *JW, a < x < l; 


= 0, x < a, x > b. 

n—1 j 

A finite probability X e~' is assigned to x = b. 

i-o J ' 

If one makes the transformation y = vF(x) (for a similar transformation in 
extreme value theory see [1, page 371]), then (1), and (2) become 


do 

Olfo) - 1 - 

i-n Jl 

y < *\ 


II 

t—*- 

IV 

Jtf 


and 



(20 

*, x e~Y~ l 

SM " (n - DC 

0 < y < v; 


= U. 


V < 0, y > v. 


n—1 j 

A finite probability X e ~ v jj is assigned to y = v. 

The distribution of the smallest size particle in a randomly chosen area is 
found by letting n = 1 in equation 1. 

In a similar way one can find the distribution of the nth largest particle in a 

oarSfr.^, 086 ! ^ Hn - X) ’ tKe probabilifc y the nth largest size 
Particle m a unit area is <x, is given by 



A MODIFIED EXT11EME VALUE PROULKM 


101 


(3) 


H„(x) = 0, x < 


a\ 


= y - ki - f(x))] j 
1=0 " ’ 


x > a, 


where* we have aligned to the size x = a the probability 23 e - ’ -. 

j-o j! 

If, as before, /•'(./■) is assumed to have a derivative/(x) for all x lying in 
ft £ J- is then tin* piuhuhility density for the nth largest size particle is, for 
any x ^ ft, given by the function h n (x) where 


(4) 




0, 


(» - 1)! 
x < a, x > b. 


e/fr), 


a < x < &; 




A iinito probability |23 c " jj is assigned to x => a. 

If one makes this transformation z = v[l — F(x)], then (3) and (4) become 


(3') 


11%) = 1 - £ e- 1 

»-o •?> 


z < e; 


and 

(40 


“ 1, z > c, 

~jt n~l 

/^( z ) 0 < z < v; 

= 0, z < 0, z > v, 


with a finite probability 23 c ~' -p assigned to z ** v. 

1*0 J • 

The distribution of the. largest size particle in a randomly chosen unit area is 
found by letting n ~ 1 in equation 3. 


3. Conditional distributions of the extreme values. The appropriate con¬ 
ditional distributions for the problem under consideration can be written down 
readily. The step function component which occurred in section 2 is no longer 
present since we restrict our attention only to those areas which contain at least 
n particles (in the general ease of the distribution of nth smallest or nth largest 
size partieles). 

(?£{.r,), the d.f. of the nth smallest particle in a unit area chosen at random 
from the class of areas containing at least n particles, is given by 

<7» 0, x < a; 

1 - 2c-’ KW (cF(x))Vi! 

(5) =* -— ■ '~°" n -i— —-I ft < X < 6; 

i — 23 <rV/j i 

1-0 

x > b. 


1. 



102 


BENJAMIN EPSTEIN 


Similarly H° n (x), the d f. of the nth largest particle m a unit area chox-n at random 
from the class of areas containing at least n particles, is given by 

H‘„(x) = 0 , x < a, 

2 e-' [1 - p<l), [v(l - F(x))]’/jl - 2 e-V'/j! 

(6) = d=£----, « < J' < h ; 

1 ~ Ze'V/j' 

i-o 

= 1, x > b. 


4. General remarks and an application. It is interesting to note (hat (Ik* 
assumptions of section 1 lead to distribution functions in section 2 which are 
precisely the same as the asymptotic distributions of smallest, largest, or nth 
smallest, or nth largest values in samples of fixed sine N(N —v x) (see eg. 
[1, p. 371]). In the problem treated in this paper, v, the expected number of 
particles in a unit area, plays the role of IV in the fixed sample size ease, with the 
important difference that the distributions in the present paper are exact and 
not merely asymptotic. 


The results of this paper have a direct bearing on certain aspects of fracture 
problems [2] and in particular on the dielectric breakdown of capacitors [.'!]. 
In the latter problem there appears to be ample justification for assuming that 
the breakdown voltage is influenced to a considerable degree by the presence of 
flaws kaown in the technical literature as conducting particles. These pai t icies 
are spread individually and collectively at random throughout the area of I he 
capacitor and, depending on their size, create a local weakening of the capacitor 
by reducing the nominal insulation thickness in the neighborhood of flaw.-,. 
The voltage required to break down the capacitor is equal to that required to 
break it down at that spot where the greatest penetration has taken place. 

In the dielectric problem the statistical distribution of largest values ap¬ 
propriate to the problem is given by (3) with n = 1, and the size distribution of 
conducting particles follows a law of the form f(x) = \d~ Xl , x > (), This is a 
situation where all the capacitors under test are part of the sample (since all 
must be tested to destruction) and those which happen to contain no defects fan 
event with probability O act as if the largest particle size is equal to a' - 0 

6 s m p ly ^Presents the expected fraction of capacitors which have strength 
equal to the theoretical strength of the insulation. 

fniw C ° ndi “ di3t ? Utions of section 3 WouW ^ more appropriate in the 
ving sort of practical situation. Suppose that surface flaws spread at 

an om on glass rods are known to reduce greatly the strength of the rods 

fnsoectToftho 111 * ^ "“T K°l ^ ^ 0116 fcttkeS 0Ufc by 80,110 ^.thod of 

inspection those specimens which have no flaws. Then the strength distribution 
of the remammg specimens is a conditional distribution since each specimen must 
contam at least one flaw to be eligible as a member of the sample 



A MOIHHKI) hXTHt-.Mi: VAMI. FKOIIUAI 


103 


KT.I’BRF.Xf KS 

(1 j II. ritAM/.n, MathcnnUcal Mrlhwh of StntishcH, Princeton University Press, 1940. 

(2j B KrnTKIN, "Statistical .v,ports of fracture problems,” J. Applied Phijs., Vol, 19 
pp 110-11? 

(;)j B Kl'tvruiN anii If. Brook 'a, “Tin* theory of extreme values and its implications in the 
atudy of the dielectric strength of paper capacitors," J Applied Physics, Vol. 19 
(lfMSI, pp 511 55(1. 



ON DISTINCT HYPOTHESES 


By Agnes Berger and Abraham Wald 
Columbia University 


1. Introduction, The following problem was suggested to one of the authors 
by Professor Neyman: 

Let X = (Xi , Xt, ••• , X n ) be a chance vector and let h denote any simple 
hypothesis specifying its distribution Let Ii, bn the composite hypothesis 
that some element h of a set of simple hypotheses |h) { , (i - 0, 1), is true 1 , and 
assume that He and Hi are known to be exhaustive. Let hi denote an element of 
{hU (i = 0, 1). 

For any region W of the sample space S, let P(IF | h ) be the probability that 
the, sample point falls in IF when h is true. 

We shall call Ha and Hi distinct, if a region IF exists for which 


POP | h 0 ) ^ P(W j hi), 


for all ho e [h} 0 
and all hi e (Aji, 


The problem is to establish necessary and sufficient conditions for two enmposilc* 
hypotheses Ho and Hi to be distinct. 

For any critical region IF for testing H 0 against //,, lotyOF | h) be ihe proba¬ 
bility of a wrong decision when h is true, i.e. 


y(W\h) = 


lP(W | h) 

(1 - W|A) 


Suppose now that Ho and H, are not distinct, 
exist such that 


for h ( H 0 
for h t Hi. 

Then to any IF a pair hi , h[ 


P(W | h 0 ) = P(W | hi), 

thus 

y{W | hi) = 1 - y(W | hi), 

and therefore 

1 ll -b 7 (IF |h) > * for any IF. 

.ssass2s.ts.*:a£- 

T™,!!T a ' 7 h Shal ! U0W proVe the flowing lemma: 

simple hypothesis m lu 1,0 lht ' 

l K > Mx) ’ ~ °> !)• Assume that the set Ii of ,c's 
104 



<i\ ui.vriNcr inpmiii.,M> 


105 


mthfinng /Ur! * ]h (.n has a positive measure. Then there exists a region W 
such that 7(11 i ?'() < i, * ~ 0, 1. 

Fhoof: Let A 0 hi- defined by - ,,, , A, l, y p 0 < Pl , R, by Vo > pt , Since 

l tlx 1 "■ (i «. 1). lh and R, arc of positive raea- 

Lel 


‘a sure. 


Then 


a) 


lh in lh 

4><x) *» • pll in /f s 

! 

lh = jk in . 

/ <^{/l r lx > 1 and either 

J ft; 


f pi dx >5 or l)) f p 0 eh > 1 


or both. A^mne first at. 

Irl lh C lh d- A« and such that j Pl dr = J, but f p 0 dx < h This 

*’ s » J K, 

can lie done by including into As a part of Hi of non-zero measure. Let A< C R x 

+ Hu - lh attd such that 0 < f p t tlx < J - f p 0 dx. Then 

Jit, Jhi 

/ ik tlx / pyfx ■; 5 - / p,i tlx, thus / p 0 dx < jjbut [ Pl dx>L 
Js, J*n Jfi) Jr ,+«, 

Assume now li), 

Let Hi Cl lh and such that f pa dx = A, Then f Pi dx < 4, 

hi, 

Lei Rt> <Z lh — lh and Hindi that 0 < I p« dx < k — [ pi dx. Then 

Jut Jr s 

/ lh dx > $ and / p t dx < J. 

Thus in raws a) W « Hi -j- Ah , and in ease b) IT ~ S — Rt — Rt is a critical 
region for which yl\Y j;»,) < ^ U 0, 1). This proves the lemma. 


3, The main theorem. As-mmc nmv A' to have a density function p(x, \ 6) 
where 0 <w (lh , fU, • ■ • , (hi is an unknown parameter point. Let wo and wi 
he two disjoint, bounded and dust'd subnets of the fc-dimensional 0 — space. 
Let 0 c- wo -t «i and suppose that 0 is known to belong to fi, which therefore 
will be called the parameter space. Let lh, lie the hypothesis that the true 
parameter point is an element of «,, (i « 0 , 1 ), 

We shall consider tin; problem of testing 11 o against Hi . Clearly, A (IT | h) 
can now be written as A(H* j 0) and -/(If J h) as 7 (IT | 0). 

We shall make the following assumptions concerning p(x | 9): 



106 


AGNES BERGER AND ABRAHAM WARD 


Assumption 1. p(x | 6) is continuous m 8. This is of course always fulfilled 
if SI consists only of a finite number of points. 

Assumption 2. For any bounded domain M of (he sample space ire have 

I [Max p(x j 9)] dx < 

J nr e 

It follows from Assumptions 1. and 2 . that 
(3.1) lim f p(x j 6) dx = 0 

r— co * S—8 f 

uniformly in 8 where >SV is the sphere in the sample space with center at the 
origin and radius r. 

In what follows, whenever we shall speak of cumulative distribution function 
g(8) in the ^-dimensional parameter space, we shall always mean a cumulative 
distribution function satisfying the condition 

f dg (0) = 1 . 

•la 

For any c.d.f. g(6) let W„ denote a critical region which contains any sample 
point a; satisfying the inequality 

/ P(x\6) dg(0) > [ p(x\6) dg(6), 

J “1 Jut 

and does not contain a sample point x for which 

J ui P(x{8)dg(8) < f u p(x [ 9) d,j{ 0 ). 

It can easily be verified that W s minimizes the average risk 

(3.2) y (if I B) dg(8), i.e,, ^ if c | 6) dg{6 ) = Min J 7 (If 1 0) dg(8). 

Let Q, (i - 0 , I) be the class of all density functions p(i) * [ v( x I 0\ da (B) 
where g,(8) is subject to the condition 


/ dht($) 

Jbii 


1 . 


" e * ■» M Md. 

fuoction of" I. °Irt'ion^daSn^ 0 ”* *' 1Ml 2 ''A ' #) “ “ c<m,inu «“ 
We »h»ll provo ^ "» wW > "»»* *u »• 



n\ DISTINCT HYPOTIIIISES 


107 


1’imoF. Snppti-t* that ft, and Oi arr* not disjoint. Then there exist two 
distribution function- y,0)i and g L (0) .such that 

[ <hlM -- f d(h (ft) = 1 

• « 0 * U i 

and 

[ 1>U , <0 rhja(0) - f p(x | 0) d(ji(e) 

(except pci haps for points / in a set of measure 0). 

Let, <j( S» -• i t- J J flifC). Clearly, 7(17) > f y(W | 6) dg(6) = | for 

Jn 

any H*. Tina proves the necessity of our condition. 

We shall now a—unie that, tit and 5h are. disjoint. First we shall show that the 
results of jl| can la* applied. On pages 21)7-8 of jl) there are seven conditions 
listed for the sequential cum 1 . For the non-sccpu'ntial case (the one considered 
here) the conditions ti and 7 drop out and the first five conditions can be reduced 
to the following conditions: 

Condition 1: Tit? ictvjhi function, 117 0, d) is hounded 

Condition 3' For any 0, the chance, vector X admits a density function p(x | 0). 
Condition d: For any nquoier \0,} (i - 1, 2, • • • , ad inf.) there exists a sub¬ 
sequence 1 0,\ ij - I, 2, • • • ,l and a jmameter point 0 a such that 

lim p(x j 0,/) => p(x | Co) 

Condition 4- // { 0, j (j *» 1,2, * * •) is a sequence of points and 6 a a point such that 

lim p(x j Qi) ■* p(x j Co) 

then, 

lim W(9,, d) ~ W{Qo, d) 

uniformly in d. 

Condition <i; The same as our /Assumption 2. 

In our problem dfthe dmsion of the statistician) can take only two values: 
acceptance or rejection of //«. Condition 1 is evidently fulfilled, since 17(0, d) =* 0 
if a i-onwl dmmon in »wdt\ and 1 if a wrong decision is made. Cleatly, 

Condition* 2 ft me also fulfilled in our problem. _ . . 

A. distribution g{9) is said to he least favorable, if it maximizes the minimum 

average risk, he., if it maximizes rOF i 0) dg(9) wit *' L res P ecfc t0 P* 

It follows from Theorems *U and -1.4 of [I] that there exists a least favorable 

distribution. , . . 

Let g*(tl> be a least favorable distribution. Then, as has been shown m L J 

there exists a if V such that 



108 


AGNES BERGER AND ABIIAHAM WALD 


(3.3) Max y(W,' 1 6) = f y(W,. \ ff) dg*(fi). 

e J n 

Thus, our theorem is proved if we can show that 

(3.4) f y(W'.\8)do*(Q) < J. 

•in 

Let H* be the hypothesis that the true density is given by 

[ p(x 1 8) dg*(0) 

PoC*) = -, 

/ dg*(6) 

* wq 

and H * the hypothesis that the true density is given by 

f p(* 1 0) dg*(8 ) 

Pi(x) = ^-— . 

/ «) 

Since too and Sii are disjomt, p 0 (x) and pi(x) are diflerent density functions. 
Hence, according to Lemma 2.1, there exists a critical region W* for testing H* 
such that a* < § and /?* < where a* is the probability of typo I error, and 
0* is the probability of type II error, Clearly, 

(3.5) i > a* J dg*(8) + p* f dg*{8) = f 7 (W*\0) dg*(0), 

v6»o Jwj J Q 

Hence, our theorem is proved 

It follows from (1.1) that if H 0 and Hi are not distinct, Ob and Slj are not 
disjoint. 

On the other hand, suppose that fio and fii are not disjoint and let 
f^p(x\8)d g 0 (e) = £ p(x 1 8) d{/i(0). 

Then for every W 


f P(W\8)dff 0 (8 ) = f P(W\8)d gi (0). 

0 J U>1 


aiZZTptwfZh ’ 8et (i ■ »■ Tim, l,»„ „f ,|„, 

™ Z ZfmZl’l Un ° tl0nS *“ 


P(WI8 0 (W)) = f P(W\8)dg 0 (8) 

J u o 


and 



ir '** i nvj'(uiu,.-i.-> 


109 



V U fc . !i •- / /'* IF j 0) d(J\(0) 

J 1 

for every H\ 

Ilrte ", ?(■« .i‘-i... of 


/M1 ” H* - /’■ H' ; fi.ffn) 

for every H\ 

Ac !njlnwitt(t theorem: 


TjiI'‘kt;m 2. //<•< s ft < '*« f -' 'i (), ij t wnt/cr the assumptions 
i'j Ihtnrim .11, » ■'.’•‘-f '/ ■'Jr;,/;* cohdxiinti jar Ho and Iii to be distinct 

is (hit! >«. U; ’• 

V-t ! I HI M [• 

jlj A. Wah>, “in'in !^' Ji ! 8 r <* < r* • * *;> -rv tif fp'lucntial decision functions,” Econo- 

n‘1nr t, \ 3-' j, /' , ■’ j 



AN APPROXIMATION TO THE SAMPLING VARIANCE OF AN ESTI¬ 
MATED MAXIMUM VALUE OF GIVEN FREQUENCY BASED ON FIT 
OF DOUBLY EXPONENTIAL DISTRIBUTION OF MAXIMUM VALUES 1 

By Bradford F. Kimball 
N. F. Stale Department of Public Service 

1. Introduction. Given the doubly exponential distribution of minimum 
values 

(1) F(x) = exp (-e' v ), y « a(x - u), 

where a and u are unknown parameters, with a prescribed frequency F© the 
“reduced variate” y is fixed, say at y = yo . Thus with 

F 0 = .99, 2/o = 4.60015 

Given a sample of n maximum values Xi , we are interested in the munpling 
variance of 

(2) x = g(u, a) - A + y 0 /« 

due to sampling variations of the estimates it and a. 

H Fairfield Smith has recently pointed out to me that the examples of applica¬ 
tions of sufficient statistical estimation functions to this problem Riven in a 
previous paper (see [1, pp. 307-309]) give too large a range for f *■> y(ri, re) 
because the sample points (ft, a) within the confidence region of the constant; 
probability ellipse apply to optimum estimates of (ft, a) mthor than to that of 
g ~ g((i, &). What the problem calls for is the determination of the pusil ions of 
curves g(u, a) and g(u, a) such that the integral of the pdf of the estimation 
functions over all sample values (H, a) which lie between tlieae two curves jb 
equal to the confidence level (taken as .95 in previous paper). Further con¬ 
siderations of this being the shortest interval ft — g, also come into play. 

As so often happens in research, the previous analysis, although not giving the 
final answer, suggests the next step. If we change our parameters to 

(3) g = g(u, a) = u + yo/u, a'=* a 

and are able to carry through the inverse of the maximum likelihood solution 
for fitting of (1) to n sample values x <, then we shall be in a position to find the 
asymptotic marginal distribution of -\/n{g — g), which will give the answer to our 
problem (see [2]). 

The Jacobian of this transformation of parameters is 
,, w ,, 1 vl/a' 2 

d(u, a)/d(g, a'} *» « 1 

0 1 

and hence for of > 0 no new singularities are introduced. 

1 This involves a correction of a previous paper [1]. 

no 



AN APPROXIMATION TO A SAMPLING VARIANCE 


111 


2. The equations of the maximum likelihood solution. For a sample of 
size n, the pdf of the sampling distribution in terms of the old parameters is 
given by 

/’[«, a , 6„(;r,)l - a" exp (-2c“ oU,_u) ] exp [-2a(x ; - «)], 
and 

log P - n log a ~Zc~ a{l '~' a) — a2x, + nau; 

*» n[log a — c m (Zo~ ax '/n) — ax + au], 

Now change to the new parameters and use the substitutions: 

z, - e~ ai < , l = (2z,)/n, zo = e"““ = e^-e^' 0 . 


Thus 


dzo/dg = — a'zo, dza/dcc' = —QZq, 
anrl denoting log P by L we write 

L ~ n(log a! — ljz« - cix -f a'g — i/o]. 


lienee 

( 4 ) L, * -na'[Z/z 0 - 1 ]; 

(5) L a ' ® nfl/a' ~ d(i/zo)/da' - X + g). 


3. Derivation of expected values needed. Recall that 
5/zo - c y °'~e- a ' Ui " 0) /n - Ve~ a(z( - u) /n. 

Hence 

(G) d(l/z 0 )/da' - ~e" Ve Z(x, - v)c" a ' lZi ~ l>) /n, 

3(z/zo)/da = — S(x< — u)e D<I<_v) /a; 


(7) d\z/z«)/da» « e" M 2(x. - (?) J e “' (l ' 5 \ 

3 s (z/zo)/3« 2 = £(*. - u)V a(x '-“Vn. 

By investigation of the generating function 

6X0 “ JJlSfr/*) 1 ' *], *f = e"" 1 , 

it cun be shown that 


E[Z<r alXi - v) /n] - 1 , 

E[2fa - u)f ail, ~ v) /n] « -(l/a)r'(2) - -(1/«)(1 - 0> 
where C denotes Kuler’s constant, .577216 • * • , and 

£[ 2 (x, - tt)V“ <ar, “ u 7tt] = (l/«*)r"( 2 ) = (1/«)U76 + c 4 - 20. 



112 


BRADFORD F. KIM BALD 


Hence to find expected values of (6) and (7) wo note that 
-r w S(*. - g)e~ a '^Jn = -2(s. - g)e~ a ^~ v) fn\ 

= -Sfo - u)e~ all, ~ ui /)i + (i/<,/«)!> nf '- 

and therefore 


(8) E\d(z/z,)/da') = *p(i/*)/a«] + (y 0 /«) 

Similar analysis shows that 

(9) E[d\z/ 2o)/ 5«' 2 ] = ^(g/^/d* 2 ] + (2y,/ a )E[d(z/zo)/da\ + {ul/a % )Ii{-i/zf\, 


4. The inverse of the maximum likelihood solution. It will fir.^t he noted 
that the maximum likelihood equations (4) and (5) for determining best estimates 
of g and a' become identical to those for determining best estimates of old 
parameters u and a, when the transformation of parameters (3) is applied to 
them. This is easily verified by applying relations developed above. 2 

This means that the best estimates g and a! obtained from (4) and (5) are related 
to the best estimates of old parameters it and a by 

(10) (7 = H + yo/&i a! *» a, 

We now proceed to set up the inverse of the maximum likelihood aolulitm. 
In order to do this we first need the variance-covariance matrix of the dinwl 
solution. This is (see [2]) 


E[-L„} 

E[-L ta .\ 

E[~L^] 

E[-L a ' a .] 


Now 


L °o = E[-L„\ = n«' s , 

L a , ~ — n[z/zo - 1 + a'dW^/da'), E [l )a ,} » n (i - C + t/o ), 

** —t i[l/a ' 2 - ^(z/zo)/da' 2 ], 

E[-L«’ a '} = (»/«' 3 )[7r s /6 + (1 - C + y 0 ) ! J. 
Thus the variance-covariance matrix of the estimation functions (4) and (5) j 8 
na ' 2 n(l - C 4- y 0 ) 

11^(1 ” C 4 - Vo) (n/a' 2 )[vr 2 /6 + (1 - C + ye) 1 ] , 

The asymptotic form of the inverse solution for V» (O - g) and Vn (,V - t 

« **>■ ^ ? um 10 
read -h(S/T 6 )/°a«, (5 ' 2) ^ ^ and +&(5/2|,)/6 “ in Bec °nd equation of (5.2) should 



ax Awuoxnr vtidn to a kvmpuxo vvuivnce 


113 


+ n - r + -a - c. + y,)/(*/&) 

-fi - r + i/^/r/t)) //(f/c) 

This give* the M>luiion sought. From the general theory of the maximum 
likelihood solution (set; (2)) the distribution of [y/n(g — g), ■\/n(a 1 — «')] is 
asymptotically normal. Hence the marginal distribution of \/n(g — g) will be 
tisymplolicalbi normal, and for finite n, the standard deviation maybe approximated 
by 

(12) a{g ~ g) * (1/(VW)1 VT+U - C + 2 /o)VUVG)* 

Xow the correlation coefficient for the asymptotic bivariate normal distribution 
is seen to he 

r =“ — (1 — C + Z/o)/\Ar 2 /6 + (i “ 0 + 2/o) 2 - 

If cc' were known, we. should have the standard deviation of Vn(g — g) reduced 
by factor VT -rk This is found to be equal to the reciprocal of the second 
factor in the equation (12). Hence we conclude that if a’ be known, the standard 
deviation of (§ - • g), far finite n, is given approximately by 

(13) *(6 ~ a) = i/(VW). 


5. An example. Using same example outlined in previous paper (see [1, 
pp. 307 300] t, we hnve n - 57, a -- .01921, 1 - C = .422784, y a ~ 4.60015. 
This gives a « 27.820. For 95% confidence interval we take (1.96)? = 54,54, 
and with a «* lK0.fi, 

g » U + ya/a ^ 419.7, 
and the interval is approximated by 

I g - g I < 54.5, 

which as an approximation gives the symmetrical interval 

305.2 <g < 474.2. 

Method 4 used in previous paper gave the longer interval (see Introduction) 
which was not symmetrical about g\ 

302.8 < g < 507.1. 


RKFKRKNOKB 

111 It. !■'. KimiuU- "Sulliciewf hfaimticiil eHlimaliow functions for the parameters of the 
diatrilmtion of aiiiximum values,’’ Annals of Math. Mat., Vol, 17 (1946), pp. 
CS’.IUHHI. 

(2! B. B. wTi’uta, Mathematical Statistics, Princeton Univ. Press, 1943, p. 139, 



NOTES 

This section is devoted to brief research and expository articles and other short Hems, 


TESTS OF INDEPENDENCE IN CONTINGENCY TABLES 
AS UNCONDITIONAL TESTS 

By A. M. Mood 
Iowa State College 1 

Summary and introduction. Since the ordinary tests for iiidopendeiu'o in 
contingency tables use test criteria whose distributions depend on unknown 
parameters, the justification for the tests is usually made either by an appeal to 
asymptotic theory or by interpreting the tests as conditional tests. The latter 
approach employs the conditional distribution of the cell frequencies given flu* 
marginal totals, and was first described by Fisher [1]. The purpose of the 
present note is to show how these testa may be regarded as unconditional tents 
even though the parameters are unknown by augmenting the test criterion to 
include estimates of the unknown parameters. We present no new testa, 
merely a new setting for the old tests which seems to put them in u little better 


1. Certain conditional tests. A variate or set of variates x has a probability 
density function f(x\ 0) under a null hypothesis involving a parameter or set of 
parameters 0 When the parameters have a set of sufficient estimators ft, the 
joint density function of a random sample of size n may be put in the form 

n 

(1) n/<*,;(?) = g(.x u xt, ... , Xn \t)h (M). 

with th Tr ber ° f parameters ' w « ^ ^ 

LetVlx I criteria which are not functions of the estimators alone. 

T , . v ’ 2 ’ •**> be a test entenon which may not be put in the. form Afth 

TX otSorm °' X “ d K ° bW “ ed % 5Ummb ' ^** ** X «d 

® *(* | $)h(8\9). 

aioZThl £X‘ i0 “ ot x wi " be ^ wit or 

»« liko to divide the A 
way that S e would have a mL, -k \ eS1 ° n ^ anc tt critic ^ region .S'* in such a 

' Tt. »th„ * willl g „ lU 

1H 



Ti:.STf> OP INDEPENDENCE 


115 


interested hero only in the fact that the size of S e cannot be determined because 
of the presence of the unknown parameters 0 in m(\; 0). 

One can set up a conditional test by using the conditional distribution 7c(X | 0). 
That is, for fixed 0, the measure of any region 12(0) (which is measurable relative 
to k(\ | 01, say, in the Lebcsgiio-Stieltjes sens") of the X space is known because 
the 0 art 1 known m any given instance. Thus a conditional test can be made 
with a critical region R c {&) of prescribed size. 

The conditional test may be interpreted as an unconditional test in the present 
instance, in the, following manner; the unconditional test is made by using the 
double criterion (X, 0), The (X, 8) space is divided into two regions, T a for 
acceptance and T t for rejection. The critical region T c consists of all points 
fX, b) such that X is contained in R c {b). If the size of R e 0) is a for all 6, then 
the size of T c is also a, for 

J r f k(\ | b)k(b; B) d\ S = f £ J' ^ h(\ | h) dx] h(b- B) db 
(3) «=> f ah(b\ 9) db 

J—oo 


= a. 

In this way one can make an unconditional test of the hypothesis with a critical 
region of prescribed size; of course one does not have complete freedom to 
specify the, shape of T e , but he can control it to the extent that R„(b) may be 
chosen arbitrarily for every b. T c is of course a similar region in the sense of 
Neyman and Pearson (2, 3, 4] for the augmented criterion, and the construction 
of T e is essentially the name as that used by Neyman and Pearson to test param¬ 
eters with sufficient estimators, 

2. Application to contingency tables. As an illustration we shall follow 
Wilks’ [5] treatment of a two-way table with r rows and c columns; the cell 
frequencies are ft,, and the cell probabilities are p,, with 

2yft>; = n; 22p>; = 1; = I, 2, • • •, r, j — 1,2, * • • , c. (4) 

The sample is thus regarded as having come from a multinomial population. 
We let 

(5) pi . «• 23 pa ; pH X) v> v; «>• = 22 nii ; = 22 *v 

The null hypothesis lh (of independence) corresponds to the subspace for which 

(0) va * Vx<ij ; s p, » l - s qj 

in the parameter space of the p<f . The likelihood ratio criterion for testing H 0 is 

(nn < .' l,, )(ll7i y-Q 

ft* II 


(7) 


X 



116 


A. M. MOOD 


and its distribution depends on the unknown parameters p,* and q. 
the parameters have sufficient estimators 


However 


( 8 ) 


p, a ft; /ft, q, = ft ,/ft 


for the marginal distribution of the ft;. and ft., is 

(9> ') 

and when this is divided into the distribution of the n<, (under the null hypothc* 
sis) one finds the conditional distribution of the n„- to he 


(10) g(n n ft re ] fti., ni ., • • • , n. c ) = 


(rrn.,-.!) (rm. / Q 

nliirtf/t 


which is independent of the parameters. The distribution (10) is just the 
combinatorial distribution used ordinarily in deriving the distribution of \ 
for small samples. The test for independence is therefore a conditional tost 
which however may be interpreted as an unconditional test if the criterion X is 
augmented by the estimators of the parameters under the null hypothesis, 
Instead of the likelihood ratio criterion Karl Pearson’s Chi-square criterion 
could just as well have been used since its conditional distribution is also deter¬ 
mined by (10). 

The usual difficulty due to discreteness arises in this application to cuntiugenov 
tables. It is not possible to make the significance level exactly «. In terms 
of the notation of the first section, R c 0) cannot be chosen so that it will have 
size exactly equal to a for all 8. One would ordinarily replace the equalities bv 
inequalities, The RM) would be chosen to have size less than but us close to a 
as possible. The size of T a is then unspecified and one cun only state tlmt his 
significance level is less than a. This difficulty is not particularly serious in 
practice unless the test criterion has only one degree of freedom, 

REFERENCES 

[1] R. A, Ann, fatMcd Method, for ftW, 0IU .„ 

H J ' N< ”™Z« »' ITS"pu t’ ,hc ”’“ l *=“ 

... > v- n/poraeses, Hoy. ooc. Phil, Trans,. SeneB A, Yol 231 tlim'i n ovi 

1 Tat&tl' ' <Suiricl<!nt e^tisUca and uniformly n ,,wt v ,m-rtul 

r _i T w tests oi statistical hypotheses," Slot, lien Memoirs Vol. ] nn'lfll ,7 ,,V 

BY ^pr 0 babflity e , 8fclCUl ® sl ' i . mation <“» th« clwoftlil, Ptiry 

IS] s S, Wil k6 , Mathematical Statistics, lUinoeton Udv'orffiy 3 m 



5 f ,o SUiNIl’K'.VNCE LEVKLS 


117 


THE 5% SIGNIFICANCE LEVELS FOR SUMS OF SQUARES 
OF RANK DIFFERENCES AND A CORRECTION 


Uy Edwin G. Olds 
Carnegie Institute of Technology 


About ten yeais ago this author published a paper [1], containing tables for 
use in testing the, significance of the rank correlation coefficient, In a paper on 
noil-parametric tests, (2, p. 310] Scheff6 remarks that it would be desirable 
to have thaw tables extended by inclusion of the 5% values. When the com¬ 
putation was begun it was noted that a necessary formula was given incorrectly. 
The main purpose 1 of this note is to correct the formula and to extend Table V, 
[l, p. MS). Incidentally, a minor addition for Table III, [1, p. 143] will be 
supplied. 

The formula for the, rank correlation coefficient, r', is given by 


- l - tlli’ 


TV 


3 _ 


n 


where n is the number of individuals ranked and ~cf — £2 d\ (d, being the rams. 

i-i 

dilTeience for the ith individual). As noted in the original paper, the nuii 
hypothesis, r' * 0, is equivalent to the hypothesis L'd 2 = (n 3 - n)/6, and the 
latter hypothesis is slightly more convenient to test. ScheflA’s remark seems 
to be directed at Table V, which gives, for 11 < n < 30, pairs of values between 
which L'd’ J has a probability, P, of being included. Values are tabled for P = .99, 
.{IK, .llf), ,{K) and .80. The necessary values for P = 95 are given below and 
can easily Ixi copied in the left-hand margin of the original Table [1, p. 1483. 
These values, as in the previous ease, have been calculated by using the fact that 


has an approximately normal distribution with a mean of zero and a variance of 
(n - l)[n(n + 1)/12] 2 . In the original paper, [1, p. 142] the denominator 
in the bracketed part of the variance was printed as 0, instead of 12. 

In this author’s original paper the exact frequencies of sums'of squares of 
rank differences wore given for n =* 2 to n =* 7 inclusive, [1, p. 139]. Ike same 
results, together with the, results for n = 8, were obtained (independently) by 
Kendall and others and published some months later, [3, p, 255]. Therefore, 
it is possible to extend slightly the comparison of approximating functions 
given in Table III, [1, p. 143]. Using Kendall s results for n = 8 it is found 
that when the. approximations obtained by using a Pearson Type II curve are 
compared with exact results the average and maximum differences of cumulatives 
are .0013 and .0007 respectively. When approximations are made by using the 
normal curve the corresponding errors are .0081 and .0163. 



118 


EDWIN Ct. ODDS 


REFERENCES 

[1] E G Olds, “Distribution of the sums of squares of rank differences for email numbers 

of individuals," Annak of Math. Slat,, Vo! 0 (11138), pp 133-MS. 

[2] H ScheffA, “Statistical inference in the noioparunictrie cnne, H .Irurn/s Mnth Stm , 

Vol 14 (1943), pp. 305-332. 

[3] M. G Kendall, Sheila F, H. Kendall and B Babington Smith, "The distrihuiion 

of Spearman’s coefficient of rank correlation in a universe in whirl) all r/Hikings 
occur an equal rumber of times," BiomelriLa, Vol. 30 (1 D3H), pp 251 273 



20X6 7 
2275,7 
2556,2 
2859,0 


1(500,7 

1928.0 

2214.9 

2528.5 

2809.8 
8210.(1 
3040.2 
4071,0 

4.135.3 

5032.3 

5503.8 
0131,0 



NON NKKVTIVE QUADRATIC FORMS 


119 


INDEPENDENCE OF NON-NEGATIVE QUADRATIC FORMS IN - 
NORMALLY CORRELATED VARIABLES 

By Berth, Mat6un 

Forest Research Institute, Experimental!allot, Sweden 

In a remit paper by the author [5] the. following theorem has been mentioned 
without proof Though the theorem is very simple and easy to prove the 
author has not found it elsewhere in the literature. 

Theorem. If two non-negative quadratic forms in normally correlated variables 
with zrro means arc uneorrelaled the two forms arc independent. 

To prove the theorem, let the two forms be 

n n n n 

(1) Qi ~ 23 23 a,,x,x l , Q s 23 23 b xl x,' x, , 

t'-l j-1 1-1 /-1 

where the x,'a are normally correlated and all have mean 0. By a well-known 
theorem on quadratic forms we can reduce Q\ and Qi to the forms 

n n 

(2) Q\ ~ 23 Cii/i i ~ 23 d{Z, , 

.-1 *-i 

when 1 the t/As arid z,’s are linear functions of the xfs. In the 2n-dimensional 
normal dia cibuticm of the tjf s and the z»'s, let pi, be the covariance of y, and 
z, . It, is i hen easily shown that the covariance of y\ and z) is 2p</ , and hence 
that 

n rt 

(3) cov (Qi ,Qi) ** 2 23 23 c.‘ d, p)j. 

As the forms are supposed to be non-negative all coefficients in (2) are non- 
negative. If Qi arid Qi are uneorrelated, each term on the right hand of (3) 
must vanish. Consequently, if c; ^ 0 and d,- * 0, we must have p„ = 0. This 
means tliat all yfts in Qi with non-zero coefficients are independent of all z,'s in 
Qi with non-zero coefficients. Hence Qi and Qi are independent. Q.E.D. 

* To see if Q% and Qi arc uncorrelated we need an expression for the covariance 
of the two forma in terms of the coefficients in (1) and the variances and co¬ 
variances of the original variables x ( . Let A and B be the matrices of the two 
forms (1). Clearly we may suppose A and B to be symmetric. Let the varianee- 
rovarianco matrix of the .t,’s be L. By straightforward calculations we find 

(.[) cov (Qi , Qi) - 2 Tr ALBL. 

Here we. have, used Tr M to denote the "trace,” i.e. the sum of the diagonal 
elements in a square matrix M. In case of independent variables with variance 1, 
w'e. get 

(g) cov (Qi, Qa) = 2 Tr AB. 

The formulae (4) and (5) are given in [5]. 



120 


HElttlVNN VON SEIIKM.LM! 


It is interesting to note the simplification of the independence condition Riven 
in [2, 3] which is possible when the forms uie. assumed to he notMicRutive. It 
may also be of interest to note that the condition for independence Riven in 
the present theorem is identical with the eonesponding condition for two linear 
forms. (In fact, the latter condition has been used in the abm e proof, < Further 
we observe that if ft is the square of a linear form with mean (), w e pet a 
and sufficient condition for independence between a linear foim and a non- 
negative quadratic form. The corresponding condition when ft is not Mippnrml 
to be non-negative has been given in [-1]. 

As an application consider a quadratic form Q in normally correlated variables, 
Let it be known that Q has a /-distribution with / degrees of l'umdoin. If 
further 

(6) Q *= ft H~ ft + • ‘ ■ + ft, 

where the ft’s are non-negative and mutually uncomdaled quadratic forms, 
then each ft has a /-distribution with/, degrees of freedom, nay, and E/, f. 
The proof with the aid of the above theorem is almost immediate We thus 
get another formulation of the theorem of Cochran ft] on the dwompnaitnm of a 
quadratic form 


REFERENCES 

[1] W. G. Cochhan, "Distribution of quadratic forms in a normal system with applications 
m r° tho{irla[ysIB °f covariance,” Proc. Cambr. Phil. Sot.., Vol. 30 (la.jj), ,,|> 17H-PU 

' w“Lin, "/''hr*: 

13tH H To™i,«Tp»“ir”a™ * 4 T - . .. . *«■. 

[4] M. Kac, "A remark on independence of linear and quadratic forms involving j M ,l 0 . 
,,,, pendent gauBsian variables," Annals of Math. Mat., Vol, UWHUfo .i, a) 

1 ?iS: t0 0 ? er 6 : u r sk T 

™ of estimatmg the accuracy of linn uud sample plm surveys'*, 
Meddelanden frdn Statens SkogsJorahmgsinslUul, Vol. 1)11 J9 IT), ]»p. 1 ns. 


A FORMULA FOR THE PARTIAL SUMS OF SOME 
HYPERGEOMETRIC SERIES 


By Hermann von Suheeunu 
Naval Medical Research Laboratory, Now London, Conn 1 

. »» 1 *■*<- «■* a **, M 

same color. 'The probability w(n \ m° iT ' mt ° m' 0 , uru mul m ' 1 * A hulls *)f the 

by a foi'muia due to t. Eggeuberger ..ndc"^ (“I* “ lrWa i ’ 1 



IlVt'hlKiKOMI.TKIC I*\HTI\L hl’.MS 


121 


ri! u'fn, I ( " ) ^ ' ’ frt + («« ~ DA) -6(6 + A) ■ • • [b + (» - - 1)A] 

V,,/ ,V(A r + A) ■ • • [N + (ft - IjAf 

fn fixed, n, variable). 

Now, we IK :iii'! ;e-k for the probability that the With black ball appears at 
the nth thauing. Mo hud 

w(ii < 

(2> ,,(» * 1 ^ + A)• • *(« -I- (», - l)A].fc (6 + A) ■■[b+ (a - ni - 1 )A] 

V*» " V A r (Af + A)--*[W + (tT^TjZj- 

fu, fixed, a variable) 


This function is the in - ;>, + l)th element of the series 


" ( n ■* l) 
A \A / 

58 ' 0 


b (ri, 


-«] 

- ... .p 


+ (»i ~ 1) 


b N . 

,l i i ~ i -—r 

A A 


4 


Consequently, the probability that thcn,th black ball appears at the latest in the 
nth drawing read;*, with an obvious abbreviation, 


u u« 

4 *-* 



Now, \w .ewinte the »|th black bull did not appear in the nth drawing. What 
is the aUcrruitne,’' Tlu* (n — rq -1* l)th white ball must have appeared in the 
nth draw mg at Lte»f. The corresponding probability is according to the 
equation i.Ti 


TTfnl ■ 2^ ftlt’t 

ivw-*, ** jj 4 J 



h 1 (» “ »t) 

d {/« - in) 

« *** 



ttj -(- 1)~, — + n — tii + 1 


4 


The relation (-11 nriginnte* from (31 by writing b instead of a and (ft - rq + 1) 
instead of n ( . The alternatives mid to certainty: 

(5) ir(ft) + 17(11) - i. 



122 


HERMANN VON SCIIELLING 


Change the notations m the following manner: 

(6) Tu-^a, |-»0; j + ny-^y, n-nt+l—v. 


From (6.1) and (6.4) find by addition 

(7) n —* v + a — 1. 

From (6.1) and (6.3) 


( 8 ) 

From (6 2) and (8) 

( 9 ) 


N 

A - * 7 " “• 


a N b a 

- — -> 7 — or — p. 

AAA 


Formula (5) reads now 

(7 - a - 0 ) (7 - “ - P + 1 ) • • 1 (y ~ 0 - 1 ) . p f p . 1 1 
-(7 — a)(7 a + 1) •' ‘ (7 1) 

P(0 + l)---(0 + V~ 1 ) 


( 10 ) 


+ 


(7 — a) (7 — a + 1 ) ■ • ■ (7 — a + v — 1 ) 

■F a (v, 7 — j8 — a, 7 — a + vjl) ** 1. 


F„(a, /3, 7 ; 1) denotes the partial sum of the first v elements of tlio hypergeometric 
series F{a, /3, 7 ; 1). It is to be mentioned that a is a positive integer necessarily 
as follows from (6.1). Since 


W(®) = 1 


(7 - a — ff) (7 - q — )3 -f 1 ) • (7 - ff — 1 ) 
(7 — «)(7 — a + 1) • • • (7 — 1 ) 


F ™(or, /3, 7 ; 1), 


the relation ( 10 ) can be written 


fill 1) , F a (r, 7 — /3 - a, 7 - a + y; 1) 

7 ; 1) 7 — 0 ~ a, 7 — a + r; 1) “ ’ 

where v and a are positive integers. 

This result is not interesting from the standpoint of pure mathematics since 

the sum F(a, /3, 7 ; 1 ) is known. But the relation is useful for the statisticians, 

In calculating the function W(ri) they need a sum of ni elements instead of 
(n — m + 1). If % is small (and this holds in practical applications), the 
exact calculation of W(n) is possible for every n. 


REFERENCES 

[1] F. Egoenbbhqbh and G, P6lya, Zeiis. f. angew. Math, und Mech, Vol. 3 (1923). np. 
279-289. 

[21 H. von Schf.lung, Naturu nss., Vol. 30 (1942), pp. 767-758, 



V\H1\MK UK I’Hul'OH 1'ION.S UK xvMl’UlS 


123 


THE VARIANCE OF THE PROPORTIONS OF SAMPLES FALLING 
WITHIN A FIXED INTERVAL FOR A 
NORMAL POPULATION 


By (5. A. Hvkkh 
Vnircrsily of California, Davis 


Suppose Unit we have a normal population 


( 1 ) 


y 




l 

a \/2f 



(.r - m) 2 \ 
'2a'* j 


and wc draw wimple** of A* from this population. Wo wish to estimate the 
proportion, p, of the population between two fixed limits, m + Xa and m + na. 
One way to make this estimate is simply to count the. number of observed x’s 
which fall in this interval. We shall denote this number by n. Then the ratio 


(2) n/N 

is an estimate of p. If this is done the variance of p is well known to be 

m vtt ~ v ) 

w N 


The method of estimating; p by counting the number in a definite interval is 
nonparamotrie and requires no assumption of normal or other specified type of 
sampled population for validity. However, if wc know that the sampled 
population is normal then we may make use of this knowledge in estimating p 
and possibly obtain an improved estimate. 

Another way to estimate p which makes use of the form of the sampled 
population is to compute 


(4) 


1 X 
IV ;-i 

8 5 ” Vf 23 ( x < ~~ ■ £ ) 3 | 

IV i-i 


and hence the integral 
(5) 


I 


ra+p<? 


a ■%/£* 


dx. 


It is implied in elementary texts that (5) is a better estimate of p than is (2) 
although this point is not discussed. 

It is the purpose of the present note to discuss the variance of the estimate (6) 
and compare this variance with (3), 

Now (S) is a function of the first two moments of the sample and it follows 
from an application of a theorem stated by H. Cramdr [1] that (5) is asymptot¬ 
ically normal with mean p and variance given by 



124 


G A. BAKER 


( 0 ) 


2irN 


(\e- iKS - pT*') 


,-JeV 


+ (fi 


-lx* 



To compare the relative efficiency of the counting method with (0) in complete 
detail would be somewhat tedious The referee suggests a brief discussion of 
the cases X = — °o, where we are counting the proportion less than some known 
value, and X = — p, where a portion out of the middle of the distribution is being 
counted. These cases are of particular practical interest 
If X = — « , then (6) becomes 


(7) 



We choose values of p as indicated below: 



V 

Relative Efficiency of (3) 

-2.3263 

0.01 

0.27 

-1,2816 

0.1 

0.56 

-0 8416 

0.2 

O.GG 

-0.5244 

0.3 

0.75 

-0.2533 

04 

0.64 

0.0000 

0.5 

0.64 


We get values of the relative efficiency of (3) that are low for small p and some¬ 
what higher for larger values of p. 

If X = — p, then (6) becomes 



We choose values of p as indicated below: 



V 

Relativo Efficiency of (3) 

1.2816 

0.8 

0.63 

0.8416 

0.6 

0.46 

0.2533 

0.2 

0.12 


We see that the relative efficiency of (3) ranges from close to 0.75 to rather small 
values. 

Other choices of X and p yield relative efficiencies of about the same order of 
magnitude as those illustrated 


REFERENCE 


[1] Harald Cham£r, “Mathematical Methods of Statistics,” 

1946, section 28.4, pp 366-367 


Princeton University Press, 




125 


biserial coefficient of correlation 


THE POINT BISERIAL COEFFICIENT OF CORRELATION 


By Joseph Lev 

AVic 1 or!; Slate. Department of Civil Service 


The product luomcnt coefficient of correlation between a continuous variate y 
and a variate .r which taken the values 1 and 0 only, is known in psychological 
statistics as the point biserial coefficient of correlation. Let i/,, i = 1, • • • , n, 
be observations on »/; j/i, , i — 1, • • • , m , be y values which arc paired with the 
value x ** 1;;/«,, t = 1, ■ ■ , n 0> lie values paired with x = 0; y, y L , and fo be 
the corresponding means; and n = », + n 0 . Then the point biserial coefficient 
of correlation may be written 


( 1 ) 


(pi - P.) 

-* 1 J2» "li 

23 23 (vo - y) 1 

_ t“0 __ 


The distribution of r is readily obtained wh.cn the yi , i — 1, ■ • • , n, are 
distributed us 


( 2 ) 

where 


l 






i = 1,2, — , ni, 
t = «i + 1) ?ii + 2, • • ■, n, 


v 3 is the variance of the yt about the common mean a, and p is the parameter 
which represents the correlation between the y, and the x<. It is easy to verify 
that the statistic in (1) is a maximum likelihood estimate of p. 

It will be convenient to express the two population means in (2) as m and p a 


so that 


(3) 

, Ak 

Mi * a + pa- A/ ~ , 

. Ah 

ao = a P<ry-. 

Hence 


(4) 

, Ai«o y\ — Mo 

P ~V n c 



126 


JOSEPH LEV 


Now write 


(5) 


jninu ^ _ g 0 )^/ n _ 2 

y n _ 

r 1 n * 7+ 

2 2 - y>y 

L.-o ,~i j 


-\/n — 2 r 

\ 7 r~T* 


where r is obtained from (1). 

Using (5) we may write t as 

(jji ~ go) — (Mi ~~ Mo) 


/ n 

, _ T raiWo 


+ 


m ~ mq 


■Vi - p 2 


r i "> U 

2 2 («/.•» — 2/>) 2 

t-0 >'-l _ 

71 — 2 


4 /— -V~ 

K 7ti?ro_ 


£T\/l - 


Therefore t has non-central { distribution [1] with 



The methods and tables given in [1] may be used to calculate tests of significance 
and confidence limits for p 

When p = 0, t has Student's distribution, and the statistic £ — -y/n —~2r/ 
v/l — r 2 may be used to test the hypothesis, p = 0, by means of the L tables 
with n — 2 degrees of freedom. The non-central £ distribution then determines 
the power function of this test. 

Table IV of [1] can be used to calculate confidence limits for p. If the con¬ 
fidence interval is to be based on equal tails of the distribution choose a confidence 
coefficient 1 - 2e. Th en com pute {(/, k , e ) and <5(/, fo , 1 — «), where / - n — 2, 
and <0 — Vn — 2r/V 1 - r 2 . 

A lower limit for p is given by 

U , e) 

[71 + 5 2 (/,£o,e)]>’ 

and an upper limit by 

«(/,*>, 1 ~ 0 
[n + 5 2 (/, 1 - *)]»■ 

REFERENCE 

til N- h, Johnson and B L Welch, "Applications of non-central {-distribution ” Bio- 
metrika, Vol. 31 (1040), pp 362-389. 



MEAN* DEVIATION* 127 

A NOTE ON KAC'S DERIVATION OF THE DISTRIBUTION OF THE 

MEAN DEVIATION 

By II. ,1. Godwin* 

University College of Swansea, Wales 

In a patK>r on a general class of estimates of deviations, Kac [3] obtained an 
expression for the frequency function of the estimate of mean deviation from 
the mean in normal wimples. He was unable to establish the identity of this 
with an expression obtained earlier by me. (1). I now show that the two results 
are, in fact, equivalent. 

Kac lines the functions f u> (.r), defined as the k - fold convolution of 



I used the functions (h(r) defined by the recurrence relation 

(1) U'o(x) » 1, G k (x) - f GUt) dl 

Jo 

Now I have shewn elsewhere (2] that the integral of ^ taken through 

the interior of a regular simplex in k dimensions, with its centroid at the origin 
and of aide a, is s/k -f I Gi.(a/ v/2). The relation (1) corresponds to a dissection 
of the simplex into sections, which are (k ~ l)-dimensional siraplexes, by joining 
the centroid to the vertices and taking sections parallel to the base of each of the 
(k + I) smaller aimplexes so formed. If however we take sections parallel 
to a base of the whole simplex we get another recurrence relation, viz. 

(2) Gy (x) - r di< 

Jo 

Now (2) may be re-written 

a k {ixx) <r iMlM)) _ r e _ ta .*_o« w GUntK (n,, ' m dl 
n* Jo n k ~ l 

whence, by induction, Gk-i(nx)-e~ <Ln ' x ' m = and the equivalence 

of Kac’s result to mine is established. 

REFERENCES 

ll| H, J. Godwin, "Go the distribution of the estimate of mean deviation obtained from 
samples from a normal population,” Biamelrika, Vol. 33 (1945), pp. 254-266, 

(2) H. J. Godwin, “A further note on the moan deviation,” Biometnka , Vol. 35 (in the 

press). 

[ 3 ] M, Kac, "On the characteristic functions of the distributions of estimates of various 

deviations in samples from a normal population,” Annals of Malh, Slat,, Vol. 19 
(1048), pp. 267-261. 



128 


A. M. PEISER 


CORRECTION TO “ASYMPTOTIC FORMULAS FOR SIGNIFICANCE 
LEVELS OF CERTAIN DISTRIBUTIONS” 

By A, M, Peiser 
New York City 

Professor Henry Schefffi has recently pointed out to me an error in my paper 
“Asymptotic formulas for significance levels of certain distributions,” which 
appeared m Annals of Math. Slat., Vol. 14 (1943), pp, 56-62. In the determina¬ 
tion of the significance levels of Student’s t distribution, appeal was made to a 
theorem of Cramdr which requires independent random variables, The variables 
defined at the top of page 61, however, cannot be taken as independent, so that 
the theorem does not apply. 

The asymptotic formula (following the notation of the paper) 



where 

®(Vp) = 1 - V, *(») = e ““ 1/5 dv, 

is nevertheless correct. This may be shown directly from tho distribution 
function 


a , , 1 r(K» + 1)) 

Ux) ' 4 + vW p(W 


l (' + 1 ) 


Writing 


(■♦ S ™(■♦01 

■",[ 'A 1 A 

and using Stirling’s formula, it follows that G n (x) can be written in tho form 


aM - * ■+ jst [> + ^r- 1 + ? 

" ¥z, '‘ ~ ,r '" + I, il. 


where Q n (t) is a bounded function of t and n in each finite interval. 

Let f„,„ = y, + a„, where a„ = o(I). Then <?„(*„„) = $(y p ) » 1 _ Pt an d 
we have 



CORRECTION 


129 


na n 


HVp + a n ) ~ MV p) 

On 


so that 


(j/p + Qn) 3 + ( y P + On) 
4 


e “!(p p +o„) s 



lim no„ 


y p + ;/p 
4 


This is the required result. 





ABSTRACTS OF PAPERS 

(Abstracts of papers presented at the Seattle Meeting of the Institute on 
November 26-27, 1948) 

1. Estimation of the Variance of the Bivariate Normal Distribution. Harry 

M. Hughes, University of California, Berkeley. 

Let xi and x» be two random variables normally distributed with known means «i and m, , 
and with common unknown variance tr 2 . Consider an experiment m which the observed 
variable is Y = y /(zi — mi) 1 + (x? — mi) 1 . This paper considers the problem of estimating 
the parameter e when the observations arc grouped. By the method of minimum reduced 
chi-square with linear restrictions, two best asymptotically normal eatimatcB are derived. 
By minimization of the asymptotic variance of these estimates, the optimum choice of 
grouping is found as a function of <r. For two and for three groups, when it is known or 
assumed a prion that a is on a certain finite interval, the optimum grouping is derived 
which will minimize the maximum asymptotic variance on that interval. If such interval 
is moderately small, it is shown that the optimum grouping ia the Bame as if a were known to 
have the value at the upper end of the interval. Finally the effect of using non-optimum 
grouping is analyzed. 


2. Derivation of a Broad Class of Consistent Estimates. It. C. Davis, Inyo- 
kern, California. 

Given a chance vextor X with cumulative distribution function F(X, 0), whore 0 is an 
unknown parameter vector, a broad class of estimates of 0 is derived having the following 
properties - a) any estimate in this class is a consistent estimate of 0\ b) any estimate is a 
symmetric function of independent observations of the chance vector AT. The novel feature 
of this class is that no assumptions about the existence of various partial derivatives of a 
density function with respect to 0 are made. As a matter of fact not even the existence 0 f 
a density function is required, and it is postulated merely that a continuous function of X 
for each 6 (in a certain neighborhood of the true 9 0 ) and of 6 for each X exist which satisfies 
a Lipschitz condition in 6. For each such function having a finite first moment an estimate 
of 0 is constructed which has the properties a) and b) listed above. 


3. Locally Best Unbiased Estimates. Edward W. Barankin, University of 
California, Berkeley, 


Let V m {**(*);* * 9) be a family of probability densities in the space n of points *■ 
and g a function one Let s be fixed and >1; call an unbiased estimate of p beat at if its 
s-th absolute central moment (s.a.c.m.) under pj 0 i B (finite and) not greater than the 
sm cun., under p*„, of any unbiased estimate of p. With a certain integrality postulate 

exist ! : 8 f a ne °?® eQry , an , d 8uffici ® llt condition, of finite character, ib established for the 
existence of an unbiased estimate of g having a finite s.a.c.m. under pi,,. When such a one 
exists, there then exiBts a unique unbiased estimate which ia best at 0,. The existence 
condition defines the a.a c.m, of the best estimate explicitly as the l.u.b. of a set of number-s¬ 
in particular we obtain immediately a generalization of the Cramdr-Itao inequality Also’ 
when it exists, the best unbiased estimate is explicitly constructed from the function a 

‘' 2 "* , “ d “ d Al '°- * 


130 



ABSTRACTS OF PATERS 


131 


‘I. Some Problems Related to the Distribution of a Random Number of Random 
Variables. Edward Pavlkon, University of Washington, Seattle. 


Let (x, 1 (t == 1,2,3 ■ • • ) lie a set of independent random variables with identical distribu¬ 
tions, with E[i) => rianila’fr) = b (0 <6 < «j) Let W bo a positive integral-valued random 
variable with distribution I'\(N) depending on a parameter X, where E(N) = Ay , and 
<r 5 (i\ ) = Ih to < By < “=). Now let Tn = xi -f- xj 4- x// The limiting distribution 


as X 


of 


has been investigated by Itobbins (Proc, Nal. Acad. Science, 


V'.v — a/U 
%/nVJx + li/lx 

Vol. 34 (April 191,S), pp, 102-103) for several different sets of conditions on the distribu¬ 
tion of A. It can be shown that analogous results will hold if instead of 2'# we consider 
a more general statistie 2'y , whose conditional distribution with respect to the variable iV 
is such that there exist constants a, and in so that 


lim 


EI exp 


\ Vl>iN )_ 


= Ht) 


uniformly in any finite t interval, where h(l) is a characteristic function. Returning to the 
statistic 2’v , it can lie shown that there exists an asymptotic expansion in powers of X - * 


with remainder 0(X~ , ‘ Wl1 ) for P 


- 


s < ul when the following conditions are 

(.Va’/U + 6Ax J 

satisfied, (I) the distribution function of x has a non-zero absolutely continuous compo¬ 
nent, (2) /£{! x '.*) < «, and (3) X —> « througii integral values, and F\ (iV) iB the Xth con¬ 
volution of a random variable n such that li{n k ) < » 


5. Asymptotic Expansions for the Distribution of Certain Likelihood Ratio 
Statistics. Albert II. Bowkkr, Stanford University. 

Asymptotic expansions of the "Cramorian” type aro derived for the distribution of 
likelihood ratio statistics given by Wilks for testing various hypotheses about means, 
variances, and covariances of a normal multivariate distribution. The point of departure 
is Wilks’ result that minus twice the logarithm of the likelihood ratio has the* 3 distribution; 

terms in A. • • may also bo expressed in terms of the x 1 distribution, In addition, 
A A ! 

asymptotic expansions of the "Fisher-Cornish” type arc obtained for tlie percentage points 
and for a transformation of the statistic to ax’ variate. 


(i. On a Problem of Confounding in Symmetrical Factorial Design. Esther 

Seiden, University of California, Berkeley. 

Let mj(r, «) be the maximum number of factors that is possible to accommodate in 
symmetrical factorial experiment in which each factor is at a » p" levels (p being any 
positive prime number, n a positive integer) and each block is of size s' 1 , without confound¬ 
ing any degrecH of freedom belonging to any interaction involving 3 or lessor number of 
factors, 

R, C. Boats proved in a paper "Mathematical theory of factorial design,” Sankhyd, Vol. 8 
(1947), pp. 107-160, that the following inequality holds. 

s‘ + lS wii(4, a) S «’ + a + 2, This gives in case s - 4, 17 ^ 7rn(4, 4) gj 22. It is now 
proved that ni a (4, 4) = 17. 

The proof consists in showing that the maximum number of non three collinoar points 
which can be chosen in a finite projective space PO( 3, 4) cannot exceed 17, which according 
to a proof of R. C. Bose is equivalent to the sfcaement that m (4,4) cannot exceed 17. 



132 


ABSTRACTS OF PATERS 


7. Some Bounded Significance Level Tests of Whether the Largest Observa¬ 
tions of a Set are Too Small (Preliminary Report ) John H W.w.mi, Simla 

Monica, California 

A set of n observations arc given winch satisfy fl) They are independent nnd from 
continuous symmetrical populations; (2) The r hugest observations arc fiom populations 
with median 0 while the remaining observations are from populations with median It 
is required to lest whether 0 <.ip. Let t:( 1 ;, •• , a(n) denote the olwemilnuis at ranged 

in increasing order of magnitude For r = 1 tests of the form. Accept 0 < if x{nl < 
2x(w u ) — x(t), where a = Pr[a:(n) -f ,?(i) < 2 0 I 0 = y) and w„ ts the *wallcnl integer 
satisfying Pr[*(ia a ) > 0 1 6 = y] < a, can he obtained from n > 15. Exact significance levels 
can be obtained by assuming a sample hum a specified population (e.g itornmll. On the 
basis of (l)-(2) alone, the significance level never exceeds 2a For large n. tests can be ob¬ 
tained for any r if the observations salisly the additional weak condition. t'{ i The tail order 
statistics are approximately independent of the central ordci statistics; also the variance.-, 
of the tail order statistics are either very largo oi very small compared with tin- vuri 
ances of the central order statistics. The test is- Accept 0 < y if mnxfxfi,) | x l u -- e ); 
l < k < s < t] < , xoAere < t„ +1 , j> < j\+i ” r — 1, ie„ is the emailed utiegt r cat . 

isfymg Pr[a:(tu„) > 6 I 0 = y] £ a, and a = Pr(inftx[je(u) + - jt)] < 2 9 i 0 * yj. For 

large n the significance levol is approximately a hut is < 2a for all u. The power fumduin 
—>1 as y — 6 —» « and —> 0 as y ~ 0— > — ». 


8 Determination of Optimal Test Length to Maximize the Multiple Correla¬ 
tion, Paul Horst, University of Washington, Seal Up. 

If the lengths of the tests in a battery are altered, their intereorrelatioim and their validi¬ 
ties or correlations with a criterion are also altered Consequently, the multiple correlation 
of the battery with the criterion will also bo altered. These changes are n function of the 
reliabilities of the tests. Suppose we have given fiom a set of experimental data (A) the 
time allowed for each test in the battery, (2) the reliability of each test, (3) the inlcreonela- 
tions, and (4) the validities of all the tests. If we specify the overall tenting lime we, arc 
willing to allow foi the test in the future, we can determine the amount bv which each teat 
must be altered in oi dcr to give the maximum multiple correlation with t he eri tenon. The 
method, together with numerical examples and the mathematical proof, is presented. 


9. Some Numerical Comparisons of a Non-Parametric Test with other Tests. 

F. J Massey, University of Oregon, Eugene. 

Let F(x) be the cumulative distribution function of a It.V.A", and let. x, < Xl ... < ;r 
be the results of n independent observations ordered as to size. 

Define S n (x) == 0 if s < as,; 

h 

B * “if ** < ® < »* + 1, 


=» 1 if Xn < 

To test the hypothesis Ih ■ F(x) ~ where, F 0 (x) is completely specified, use the 

«it™ „ iMt JJ,„„« | s, w _ ,, (I) ! > Uhoioo or X OMOTJ*. u ,0 Itin.l 
oumok.ll ”" lind 01 mor ■l * cls “ 1 can I* ulmihud 



ABSTRACTS OP PAPERS 


133 


10. On the Deviation of Extreme Values. W J. Dixon, University of Oregon, 
Kugi'iip. 


Lei x{i j be the ith observation in order of magnitude in a sample of size n. The distn- 


hutirm of R 


x( 2 ). 


is obtained explicitly for samples from a rectangular distiilmtum 


x(n) 

x(n) — x(l) 

and for it *j II, 1, 5, for samples from a normal distribution. Percentage values of It for 
values of n up lo 30 are presented Generalizations of R are indicated 


11. The Optimum Size of Interval for Making Measurements of a Rocket’s 
Angular Velocity. Edward A. Fay, University of California, Berkeley. 

Over a given range of tiineO < r < ?', the angular velocity of a rocket’s spin is adequately 
repieaenled by a polynomial £(r) of given degree s — 1 but with unknown coefficients, 
The rocket’s angular acceleration and the angle through which it spins in a given time- 
interval may then he obtained respectively by differentiating and integrating £(t). Let 
n be an integer > a, let l = T/v, and let be the angle tlnough which the locket turns in the 
interval (t ~ 1 )t < r < il. While £(r) and {'(r) cannot be directly observed, the angles 
Si i >J 3 . ■ • ■ .a- can. Let IT be an observation on m , and assume that IT , IT , ■ ■ , Y, 
are independent hoinimcedaalic variables. The Y t may then bo combined by the method 
of least squares to obtain best linear estimates A'(r, t) and A"(t, l) of £(t) and f'(r) The 
choice of l is at the observer’s disposal, For the cases s *» 2, 3, 4, and for the, cases that the 
common variance of the IT is (a) independent of l or (b) propoi Lional Lo t, an expression, is 
derived for the variance of A”(r, l), and the maximum value, of that variance over the 
range of r is minimized with respect to f. The method ib of much more general application. 


12. Stationary Time Series Analysis and Common Stock Price Forecasting, 
Zenon Hzatrowhki, University of Oregon, Eugene. 

The objective of this paper is lo present a statistical procedure of practical value in the 
problem of extracting information from the past behaviour of economic time series, informa¬ 
tion to he used in projecting future patterns. Tile author feels that his approach yields 
results closer to reality than the techniques described by Herman Wold, M. C. Kendall, 
H. T. Davis, and In particular, the technique of 11 disturbed harmonics’’ used by G. U. Yule. 
The Idea of the proposed technique can bo described by examining the autoregression 
scheme, which scorns to be considered the most desirable by the above men. A Bimple 
example of such a scheme is the equation 

M(+j ™ —aui |.i — but + Ai+j, 

where the u’s are the time series values and E'b are random elements. The above linear 
relationship, when determined cither directly or through an ompirical correlogram (foi 
which data is usually inadequate) is a kind of an average relationship. It may bo as inap¬ 
propriate in estimating future values of a time series os would be an average in estimating 
the level of a series with a pronounced trend. 

The author proposes using derived time series to shod light on the nature of the changes 
in the parameters under consideration. Such derived series could bo estimates of the a s 
and Ids for successive time periods, The author has found that projections of common 
stock price fluctuations were improved considerably when the changing nature of the 
cyclical pattern was taken into account. This was done by constructing derived time 
series, “moving” ostimalos of the amplitude, period and phase of the dominant harmonicB, 



134 


ABSTRACTS OF PAPERS 


The author points out that the above approach has shown promise m commodity prices 
as well as common stocks The valuu of this approach in forecasting lies in the facts that 
(i) it does not require forecasts of othei senes and (2) it is based on the realistic assumption 
that history repeats itself but with variations, variations which may he taken into account 
through appropriate models 

13. Distribution of the Number of Schools of Fish Caught Per Boat. J. N11Y - 

man, University of California, Berkeley. 

Let X be the average number of schools of fish per unit area of a fishing ground .1. Lei a 
be any area paitial to A , and let a, X) denote the probability that exactly n schools of 
fish will be found within a. At time t = 0 a boat begins scouting for fish in .1 traveling at 
constant speed v. It is assumed that all schools of fish within distance r of the boat arc 
detected and none is detected at a greater distance. If s S 1 schools are, detected then they 
are caught in turn, the catching of one school taking up exactly h houis. A (4) denotes the 
random variable representing, foi each ISO, the number of schools caught up to time t 
including the one which may be in the process of being caught at the moment l. Probability 
distribution of X ( t ) is given by the formula 

k 

P[X(t) < k (=2 n\m, 2rv(i - kh), X] 

ju**Q 

for k = 0,1, 2, ■ • • , n - 1, where n — 1 is the greatest integer smaller than l/h. Of Course 
P[X(l) < n) «= 1. This result is easily generalized for the joint distribution of catches of 
several boats fishing in the same area so that their paths do not cross, Assuming specific 
functions to represent fi(m, a, X) formulae may he obtained to estimate the parameters X 
and tv, 

14. Some Problems in Fishery Research to which Statistical Methods are 
Applicable (Preliminary Report). Ralph P Sillim\N, IT. K Fish ami 
Wildlife Service, Seattle. 

One of the most difficult problems is the obtaining of a random sample of a fish popula¬ 
tion Rarely are such populations randomly distributed over any area, and the samples 
must often be taken from the catches of fishing vobscIs winch do not uniformly cover even a 
part of the area of distribution of the population, Many distributions of variable's found in 
fishery research are not normal, and statistical methods based on the normal distribution 
can he applied only through the use of unsatisfactory transformations. Since fishery 
research is largely observational m technique, data reflecting the concurrent off eel of 
several variables are usually obtained Although the present methods of multiple correla¬ 
tion and regression can be used in some instances to measure the relative effect of the 
separate variables, there are many situations in which these methods do not yield good 
results. Finally, many data used in fishery research must be adjusted before use, and 
existing methods do not give good measures of tho expected variability of such adjusted 
data. Examples of specific problems are found in the distribution of deliveries and the, 
variations in catch of Columbia River chinook salmon. 

15. The Application of the Hypergeometric Distribution to Problems of Esti¬ 
mating and Comparing Zoological Population Sizes. Douglas Chapman, 
University of California, Berkeley. 

_ Estimates and tests of the x 1 type, as developed by Neyman, are adapted to sampling 
without replacement from a finite population. These results are applied to problems of 



ABSTRACTS OF PAPERS 


135 


estimation and comparison of zoological population sizes as determined by sampling pro¬ 
cedures. For single samples the bias and variance of different estimators is compared. 
Finally some numerical calculations are made for various population and sample sizeB to 
determine how different sample sizeB and different methods of analysis affect the size of the 
critical region which is necessarily an .approximation to the desired size. For some of 
thcaothe povs er of the test is considered. 

10. Extension to Multivariate Case of Neyman’s Smooth Test with Astronomical 
Application. Elizabeth L. Scott, University of California, Berkeley. 

It is more or less generally accepted that the distribution of extra-galactic nebulae in 
space is not uniform in the small. In particular, counts in small cubes show distinct signs 
of contagion. On the other hand, it is not settled whether or not lack of uniformity in the 
large exists. One way of making this statement precise is to assert that the power Benes 
expansion of the logarithm of the probability density of the two angular coordinates of the 
nebulae within a given large area on the unit sphere does not contain low order terms. 
In fact, any such low order terms could be interpreted as determining “trends” or what 
could be described as lack of uniformity in the large. From this point of view, uniformity 
in the large may be tested by a two dimensional Neyman Smooth Test for goodness of fit. 

Let (*■(,(*, y)\ be a sequence of polynomials in x and y ortho-normal for I x I < a and 
I y I < b. If #* and y k are the coordinates of the fcth out of n nebulae counted within the 
rectangle (—a, a), (— 6 , 6 ) then theBmoothtestof mth order consists of rejecting the hypoth- 

m f n \j 

esis of uniformity in the large when 2 ( 2 *•.((**, 1/0 ) whore xj is the tabled 

•+/-i \*-i > 

valuo of x’ with m(m + 3)/2 degrees of freedom. 

17. A Mathematical Theory of Vitamin A Metabolism in Fish ( Preliminary 
Report). Normal E. Cooke, Vancouver, B.C. 

Several possible hypotheses for vitamin A metabolism in fish are developed from simple 
postulates. These hypotheses are tested (by least squares method) against experimental 
data in an attempt to deduce the corroot meohaaism. 

18. The Interactance Hypothesis between Populations. Stuart C. Dodd, 
University of Washington, Seattle. 

The hypothesis of interacting between human populations, or of demographic gravita¬ 
tion, is that the number of interactions between two communities (or other groups) tends 
to vary directly with the product of the two populations and their “specific coefficients” 
and the overall duration and tends to vary inversely with the intervening distance and the 
average duration of an interact. The hypothesis is tested by isolating factors and measur¬ 
ing their correlation with the amount of interacting in the pairB of a set of V communities. 

This hypothesis is supported by studios of telephoning; news circulating; travel by bus, 
train, or plane; R. R. express; college attendance; intermarrying; eto. Further lists of 
intorhuman actions are suggested for investigation, . , 

A new corroborating bit of data comes from a poll by the Washington Public Opinion 
Laboratory in a Seattle housing project where negro-white relations threatened violence. 
The tension units of verbal interaction (defined as one anti-negro opinion asserted by one 
white person) wore observed to decrease inversely with a power of the distance from a rape 
site. The observed tension correlated with the formulas or curves predicting that tension 
at p » .94 and passed the olii-square test at the one per cent level. The tension is dimen¬ 
sionally analyzed as a social force and social energy. 



136 


abstracts op papers 


19, The Employment of Marked Members in the Estimation of Animal Popula¬ 
tions. Miln er B. Schaefer, U. S Fish and Wildlife Service, Honolulu, 
T. H 

The estimation of population, numbers by marked members is ail important technique in 
fisheries research The number :V of individuals in the population, of which 7 are known 
to be marked, may be estimated from a sample of n of which t arc found to he marked. 

fl '] 1 

Several estimates are available, all of which reduce to N ** ~ when the numbers are all 

large, but more precise formulae should be used when the numbers are not all large. An 
estimate of the variance of N has been derived by Karl Pearson ( Biomelrika, Vol. 20 (1928), 
pp. 149-174) on the basis of inverse probabilities. The sampling error may also be measured 
by means of confidence intervals Formulae have been developed for estimating ,V from 
repeated samples of the same population, but no very suitable estimates of the sampling 
error are available m this case, For some migratory fisheB marked at a point on their 
migration path and sampled later at another point, there exists a correlation between time 
of marking and time of recovery in the subsequent samples, In such case, the total number 
of fish marked or drawninthe subsequent samples cannot in general bo regarded as random 
samples of the population. Where numbered tags aro employed as marks, so the fish may 
be individually identified both when marked and recovered, a method of eBtunating A r in 
this case also is suggested 

20 Non-Response and Repeated Call-Backs in Sampling Surveys. Z, W. 
Birmbaum and Monroe q. Sirken, University of Washington, Seattle. 

In opinion-polls and other sampling surveys, a response can only be obtained from those 
individuals of a sample who aro available for interviewing. Lot p.i bo the probability that 
an individual chosen at random from the population answors "yes” to a question, p t . that 
an individual is available for interviewing, and pu that an individual is available and 
answers "yes.’’ Usually one wishos to estimate the parameter p.i , but from a sample it 

is only possible to estimate — 1 = p 1 = the probability that an individual answers “yes” 
Pl> 

if he is available. Thus the total error in estimating p.\ from a sample contains two com¬ 
ponents: the bias v.i — v’ and the sampling error. In this paper a technique is presented 
in which individuals not available at a call are called upon repeatedly, up to k times. It 
is shown how, for a given upper bound of tho total error at a prescribed probability level 
and a given k, it is possible to minimize the cost of the survey by optimizing the relationship 
between the greatest possible bias and the sampling error 


(Abstracts of papers presented at the Cleveland Meeting of the Institute on December 

27-30,1948.) 


21, A Necessary Condition for a Certain Class of Characteristic Functions 

{Preliminary report). Eugene Lukacb, NOTS, Inyokern, California and 
Our Lady of Cincinnati College, Cincinnati, Ohio, 


Let^(i) = j^l ^^1 — be the reciprocal of a polynomial without 

multiple roots. The following necessary condition is derived which v{l) has to satisfy in 
order to be the Fourier transform (characteristic funotion) of a distribution. 



ABSTRACTS OF PAPERS 


137 


If v(D IK I lie Fourier transform „f a distribution, then 

1) <f(l i has lio real roots. If b 4- la (a 4= 0, 6 0) is a root then —6 + ia is also a root. 

That is the roots of y(0 are either 1 orated on the imaginary axis or are symmetrical to this 
axis. 

2) If b + ia (a 0) is a root then there exists also at least one root ia so that sign « *» 
sign a and ui S i a ! . 

As a particular case one obtains the well known fact that (1 -f l*)~ l cannot be a character¬ 
istic function. 

22. Precision of Estimates from Samples Selected under Marginal Restrictions. 
(Preliminary Report). Clifford J. Maloney, Camp Detrick, Frederick, 
Maryland. 

Formulas are derived for estimates and for their variances computed from samples drawn 
at random subject only to marginal restrictions from populations classified by several 
characters, and estimates are made of the efficiency of such sampling planB compared to 
sampling with complete s(ratification or sampling completely at random. By means of two 
simple but general theorems it is shown that the variances are independent of the individual 
values of the character being sampled for in the population and in the sample and depend 
only on the first tw o moments for each cell of the population. It is shown that in the large 
sample approximation a practical scheme for actually drawing such samples can be obtained 
by drawing a sample of size n entirely at random and using the results of Deming and 
Stephan (Annah of Math. Rial., Vol. 11 (1940), p. 427) to adjust the sample marginal totals 
to tlm specified values. Deficient cells will of course be filled up by additional drawings. 
A measure is given of the* relative loss of information in sampling with marginal restrictions 
on the sample cell numbers compared to sampling with complete stratification. If at/ 
represents the population mean in the ijfth cell, r, the population mean m the t'th row and 
ci the. population mean in thej'th column, and if ai, is of the form an •» a + n + a , then 
marginally restricted sampling is as efficient as sampling with complete stratification, For 
arbitrary an a measure of the relative efficiency compared to sampling completely at ran¬ 
dom is given by the relative degrees of freedom for the sample cell numbers. A compari¬ 
son with other possible sampling procedures is given. 

23. Properties of Maximum- and Quasi-Maximum Likelihood Estimates of 
Parameters of a System of Linear Stochastic Difference Equations with 
Serially Correlated Disturbances ( Preliminary Report). Herman Rubin, 
Cowles Commission, The University of Chicago. 

Let A*,*,' “ u,' be a complete Bystem of linear stochastic difference equat-ons, *i -> 
(lb i *0, U> jointly dependent, si predetermined, Let us suppose u[ + 3, u ui -1 = v', , where 
the random vectores v, are serially independent and have mean zero. If the vectors vt 
have the same Gaussian distribution, and the system is identified, we can obtain maximum- 
likelihood estimates; if the distributions are not identical Gaussian, quasi-maximum-likoli- 
hood estimates result. The identification problem is a special case of that with independent 
Mi and bilinear restrictions on some Aj, , if the restrictions on Aj, are linear or bilinear. 
As in that case, we may have multiple identification. However, the special aspects of this 
type of system yield some help in the discussion of the identification problem. We also 
observe that if the system is identified, wo obtain consistency and asymptotic normality 
of the estimates under the same conditions as with serially independent u’b for Au, . 

24. The Computation of Maximum Likelihood Estimates of Parameters of a 
System of Linear Stochastic Difference Equations with Serially Correlated 



138 


abstracts of papers 


Disturbances. Herman Ohernoff, Cowles Commission, The University 
of Chicago 

Consider the structural equations A vt x\ = u\ where the vector x t -> {y« , zi), yt are the 
jointly dependent, and Zi the predetermined variables and where ui are serially correlated. 
In particular assume that the disturbances u t satisfy the simple Markoff Process 
u' + B, ui*;., = where in is a stationary aerially uncorrelatad Gaussian Process with aero 
mean. Then we have A ai x’, + fl* uAurfl-t = . The estimates of Zf™ and can be 

simply expressed in terms of those of Au . It is shown that iterative gradient methods of 
maximization require about 2 to 3 times as much work per iteration as in the aerially un¬ 
correlated case. To apply the Newton Method about 8 times as much work per iteration 
is required. The Newton Method uses the second order terms of the expansion of the log 
of the likelihood in terms of the independent parameters of A*, and these can be used to 
obtain estimates of the asymptotic covariance of the estimates. 

25. Test Criteria for Hypotheses of Symmetry and Definiteness of a Regression 
Matrix for Demand Functions. Uttam Chand, University of North 
Carolina. 

The importance of relations between two sets of variates (e.g, the study of relations of 
the prices to the quantities of several commodities) invariant under linear transformations 
of one set of variates contragredient to those of the other was first pointed out by Hotelling. 
In the study of related demand functions no suitable statistical teats have existed for 
testing the hypotheses of symmetry and negative definiteness of the regression matrix of 
prices on quantities. The test proposed here for the hypothesis of symmetry is exact and 
invariant under all contragredient transformations. A. separate tost studied for both 
symmetry and negative definiteness satisfies the property of invarianco but its distribution 
depends on a nuisance parameter which is the non-zero root of a certain do terrain an tal 
equation. The likelihood ratio criterion under the hypothesis of symmetry loads to a multi - 
lateral matric equation which represents ]p(p+ 1) equations of the third degree in Jp(pf IJ 
unknown regression coefficients for the p-variate case, and does not admit of a unique 
solution. 


26. The Distribution of Extreme Values in Samples whose Members are Sto¬ 
chastically Dependent. Benjamin Epstein, Wayne University, Detroit. 

In this paper the following problem is considered. To find the distribution of largest 
and smallest values in samples of size n drawn from a random process subject to the follow¬ 
ing conditions: 

( 1 ) observations *i, ®a, ■ • , x n are taken in order from some random process, 

(n) the random process is such that successive observations x< and xr+t are jointly 
dependent. The joint distribution is described analytically, independently of t, 
by a two-dimensional d.f, 

Fi(x, y) - Prob (a* £ as, * (+l £ y), 1 £ f £ tt ~ t, 

(iii) Fi(x, y) => F 2 {y, x) 

(iv) Any other pairs of observations (*, , i, +/ ), 1 < f < n - 1, 2 < j < n - 1, arc as¬ 
sumed to be independent, 

The results in this paper generalize the special situation where all observations are inde¬ 
pendent. More general cases than those covered by (i)-(iv) are briefly considered. 



ABSTRACTS OP PAPERS 


139 


27. On Age-Dependent Stochastic Branching Processes. Richard Bellman 
and 1 heodore E. Harris, Stanford University and The RAND Corpora¬ 
tion, Palo Alto and Santa Monica, California. 

An initial particle has a random life length T with c.d f, (?(/). At death it is replaced 
by a random number .V of similar particles; P(N *■ n) * q„ . Particles produced have the 
same distributions of life-length and replacement as the original one. 

« M 

Let z(f) «» number of particles at time*, /i(») = 2 S-s", F(a, () = S P(z(t) = n)s". The 

^ n-0 n —0 

integral equation F{ a, t) - I h[F(a, l - y)) dG(y) -f- s[l - G{t )] umquely determines 

F(a, (). When suitable restrictions are put on h(a) and G(t), results of Feller can be applied 
to study tho asymptotic behavior of the moments of z(i), which satisfy linear integral 
equations of the convolution type, and further special results on the moments can be 
obtained. The condition 2 nq„ > 1 and certain further restrictions insure that z(l)/e*‘ 
converges in probability as t — «, where 6 is a certain constant The m.g.f, of the 

limiting distribution satisfies the equation 0 («) »■ J h\<t>{ae~ iv )\ dG(y). Further restric¬ 
tions imply that^(s) is analytic in a neighborhood of a - 0 , and that the corresponding 
distribution is absolutely continuous. 

28. Cuboidal Lattices. G. S. Watson, Institute of Statistics, North Carolina. 

Yates has given two series of partially balanced incomplete block designs, square and 
cubic lattices, which enable the experimenter to test respectively k 1 and k 3 varieties in 
blocks of Bise k. Harshhargor has recently given a seriea of designs, rectangular lattices, 
which supplement Yates’ squaro lattices. 

In this paper two series of designs are given called ouboidal lattices, supplementing the 
cubic lattice aeries. They may he used to tost respectively k 3 (k +1) and k{k + 1) 1 varieties 
in blocks of k t when the number of reffications is a multiple of 3. Interblock information 
may be recovered. The first aeries has a relatively simple analysis and should prove useful. 
This work was sponsored by tho Office of Naval Research. 

29. Transformations Induced by Series Approximation of Prior Probability 
Amplitude. Archie Blake, Office of The Surgeon General, U. S. Army. 

Consider a class A of mutually exclusive and exhaustive possible outcomes of a test. 
(We assume A finite; this condition can under suitable conditions be removed at a later 
stage by a limiting process.) For a hypothesis h, let u be the vector whose value, for each 
member a of A, is the square root of the prior probability of o and h jointly. This vector 
is called the probability amplitude; its norm, the scalar product u'u, is proportional to the 
prior probability of h, the constant of proportionality being determined by comparing the 
norms of the w's for all h. Let the test leave the alternatives of a subclass B of A still 
possible, while ruling out the members of A — B, Represent this tost by a vector r having 
tho value 1 on B and 0 on A — B, Define d on A A as a matrix equal to r on the main diag¬ 
onal and aero elsewhere, The posterior probability is proportional to the form value 
u'du, tho norm of the projection of « on a subspace determined by suppressing the co¬ 
ordinates of A - B. Consider the transformation « *® lv, t being a matrix on AA and u 
a vector on A. Then u'du takes the form v’t'dtu, Denote t'dt by e. If u is approximated 
as a partial sum of the series tv, i.e. by truncating v with a subclass C of A, the truncation 
induced on e is that with the minor on CO. (How much of the prior probability norm is 



140 


ABSTRACTS OF PAPERS 


retained with a particular truncation is most easily aeon if l is orthogonal, fnr then Iho 
transform of u'u is v'v) 

For example, in an agricultural experiment, let A be the composite of P, tlio Hans of 
plots, and Y, the class of possible yields on a plot Then u takes the form of a second 
order tensor or matrix on PY, while d and t are fourth order tensors, For some member 
y of Y, it often happens that some of the initially rnoBt probable, numerous, and economic¬ 
ally consequential hypotheses will be such that for them the values of u(y) are predomi¬ 
nantly high on some row of plots, low on another row, etc, The transformation u ■» lu 
on P induces the transformation e = t'dt; this is It. A. Fisher’s transformation, performed, 
however, on d instead of on the yields themselves. The truncation of v and r corresponds 
to Fisher's relegating the higher interactions to error. This calculation may ho accom¬ 
panied by a linear transformation on Y, e,g. in Beries of orthogonal functions (Much 
series are not subject to the disadvantage of classical Gram-Charlier series, which are, 
expressed in terms of the probability instead of its square root, that their partial sums 
can be at places negative ) 


30. On the Utilization of Marked Specimens in Estimating Populations of 
Flying Insects. Cecil C. Craig, University of Michigan, Ann Arbor. 

The experimenter catches flying inSeets, say butterflies, marks and immediately releaaea 
them It is assumed that all the insects in a segregated area arc equally liable io rupture 
whether unmarked or marked, even several times, and that the population is at aide for this 
period over which a senes of captures is made. From the record of inserts caught oner, 
twice, three times, and so on, the problem is to estimate the total population. Two mathe¬ 
matical models which seem appropriate are considered and four methods of estimation are 
compared with respect to the large sample variances of the estimates they give. 


31. On a Probability Distribution. Max A. Woodbury, University of Michi¬ 
gan, Ann Arbor. 


In this paper the probability of x successes in n trials of an exent is computed for the 
case when the probability of success in a given trial depends only on the number of previous 
successes. The solution P(n, x) satisfies the equation of partial differences 


P(,n + 1 , x + 1) = (5 - g.)P(n, z) + ff I+ iP(n, i + l) 

in the case when q = 1. The boundary conditions are obviously P(0,0) — landP(n x) »0 
for x CO or >n + * The solution of this equation is obtained by use of a generating funo- 
tian and x) proves to be the xth term in the expansion of j- by means of Xrwton’e 

t =? on! the valuea ?s ■ • *»«***, ■*» 


Pin, x) - PoP ,. ■?*-! 2 ( g?/[(5, - - a .) - 9v-.,)(?i - 9< „»)...(* - 

In the case p, = p„ one has the result 


Pin, x) 


P| cP 
x I dq x 



which yields the usual result on simplification. 


G. W. Brown 


32. Distribution-Free Tests of Data from Factorial Experiments. 

and A. M Mood, Iowa State College. 

A device for avoid,ng the assumption of normality in analysis of variance problems was 



ABSTRACTS OF PAPERS 


141 


developed by M. Friedman (Am Slat Assoc. Jour., Vol. 32 (1937), pp 675-701) in which the 
values of the observations were replaced by their ranks 

An alternative approach is presented here in which medians are used to construct certain 
contingency tables, and the various null hypotheses of interest are easily tested by means 
of the oidinary chi-square criterion applied to such tables. These tests- 

(1) Avoid the assumption of normality. 

(2) Are particularly sensitive to differences in locations of cell distributions but not to 
their shapes. 

(3) Usually require very little arithmetic computation. 

The tests and the relevant distribution theory have been worked out for some of the 
simpler experimental designs. 


33. On Sums of Symmetrically Truncated Normal Random Variables. Fred 
C. Andrews and Z. W, Birnbaum, University of Washington, Seattle. 

Let Xc be the random variable with the probability density 

MX) = C<T xJ/2 for I X 1 < a, MX) =0 for I X 1 > a, 

n 

and let jS ^ 1 = where X< u , are independent determinations of X a . 

The problem considered is. for given n, T > 0, t > 0, determine a such that 
BO S ' a ’ [) 1 > T ) = t . The exact solution of this problem would require laborious computa¬ 
tions. In this paper a method is given for obtaining approximate values of a which are 
"safe”i e such that B(l 51 " 1 1 > T) < e. 

34. On the Foundation of Statistics. Max A. Woodbury, University of 
Michigan, Ann Arbor. 

The results on this paper are part of the author’s University of Michigan dissertation, 
"Probability and Expected Values ” The work covered by this paper was sponsored by 
the Office of Naval Reseatch One may lake the notion of an expected value as the basis 
for the theory of Statistics; i c. a linear functional on a linear space of random variables 
(real valued functions defined over a population). The space is called statistical if it con¬ 
tains all constant functions and the expected value of Buch constant functions is just the 
constant and if the expected value of a non-negative function is non-negative A statistical 
space is called strong if it contains with a random vanablc also the random variable whose 
values are the absolute values of the given random variable. Every expected value defines 
a probability measure over a quorum of subsets of the population and it is shown that the 
integral of the random variable, if it exists, coincides with the expected value. Further 
it is shown that if the statistical space is strong the integral necessarily exists and also 
that a necessary and sufficient condition that the quorum be a field is that the statistical 
Bpace be strong. 

35. Finitely Additive Probability Functions. Max A, Woodbury, University 
of Michigan, Ann Arbor. 

The results m this paper are part of the work in the author’s University of Michigan 
dissertation, “Probability and Expected Values.” The work covered by this paper was 
sponsored by the Office of Naval Research A quorum is a family of Bets that contains 
with each pair of disjoint sets also their union and also the complement of any of its sets. 
Trivially a quorum is required to contain at least one set and hence at least the universe set 
or population and the empty set. An extension of the notion of a finitely additive prob¬ 
ability measure function to quorums is given and proved to be equivalent to the uBual 



142 


ABSTRACTS OF PAPERS 


definition in case the quorum is a field of sets. The extension of a quorum of net* I'M alive 
to the probability measure function is investigated using the properties of the inner and 
outer measure The upper and lower integrals are defined and a condition for the existence 
of the integral is given When the quorum 1 b a field it is shown that intcgrabilily of a 
function implies the existence of the distribution function. This last result is well known 
in the case where the probability measure function is completely additive. 

36. On Inverting a Matrix via the Gram-Schmidt Orthogonalization Process. 

Max A. Woodbury, University of Michigan, Ann Arbor, 

The application of the classical Gram-Sclinudt orthogonalization proeesa to I Im fac¬ 
torization of a correlation matrix is accomplished by considering the inner product [*, yi « 
E{xy) in the linear space determined by the statistical variables a* , an . In this 

way a representation of the original set of statistical variables in terms of an orthonormal 
set is obtained. (By an orthonormal set we mean a set fi , fj , ■ • , f„ such that A T (f ,£,) *• 0 

for and E{£\) = 1.) The matrix of coefficients B •» (!>■,), where xi *• 2/ ffiiEi > lias 

the property that C = BB‘ where C = ( E(x,x ,)) and ' denotes the transpose. Further the 
matrix B is triangular hence B~' is readily computed, from which one olitainB At once 
C~ l = The quantities 6,, arc readily obtainable by the method of determinants 

(Dwyer and Waugh, Annals of Math. Stai., Voi. 16 (1946), pp. 269-271, cf. pg. 264) formerly 
called the method of multiplication and subtraction with division. 

37 Certain Properties of the Multiparameter Unbiased Estimates. Cl, R. 

Seth, Iowa State College. 

If B* - (a? • , 0*) ie an unbiased vector estimate of » - (a, , ffj , • ■. , g,) in tho 

density function p{x , , x 2 , ■ ■ , i„ ; 0, , $, , • • • , 0„) having the smallest concentration 
ellipsoid among the class of unbiased estimates of C, and further if < is any statistic of q 
components having E(t) « 0 and finite covariance matrix, then c is unoorrclated with 0*. 

If a sot of sufficient statistics (T i , T s , ■ ■ ■ > T P ), P < g, exists for estimating B, then 
corresponding to any unbiased vector estimate d>* of 6, there exists an unbiased estimate 
of 6 depending on T\ , T a , , T P alone, where the latter lias a concentration ellipsoid 

equal to or contained in that of the former. 

When q = I, andU has the smallest variance among the class c formed by unbiased esti¬ 
mates of B which are functions of B* having a finite variance, and the set of polynomials 
with respect to the distribution function of <t>* is complete, then <t>* is the only clement in 
the class c. For q > 1, the result holds when the "variance” is replaced by the “concentra¬ 
tion ellipsoid.” 

38 A Class of Lower Bounds for the Variance of Point Estimates. DougIjAS 

Chapman, University of California, Berkeley. 

A class of lower bounds for the variance of point estimates ie derived by moans of the 
calculus of finite differences under very weak restrictions and it is shown that they give 
valid lower bounds for certain parameter estimation problems for which tho Oramdr-IUo 
formula is invalid. In some oases even when the latter lower hound exists a sharper lower 
bound may be found in the class here defined. On the other hand when it exists, tho Oramdr- 
R&o lower bound is asymptotically superior to any of this class. 

39, Standard Errors and Tests of Significance for Interpolated Medians. 

ChurchiijL Eisenhart and Miriam L. Ykvick, National Bureau of Stand¬ 
ards. 



ABSTRACTS OF PAPERS 


143 


If a sample of N observations is grouped by a sequence of class intervals with boundaries 
— «, ■ ■ , z _2 , , xa , Xi , z 2 , ■ • ■ , + oo , where zo is the largest boundary point for which 

the observed ‘fraction below’, ps, is less than i, and Zi is the smallest boundary point for 
which the observed ‘fraction above’, Pa , is less than i, so that the observed' central frac¬ 
tion’, p c , between z 0 and Xi is positive, then, at least for the case of N large, standard text¬ 
books take as the median of the grouped data the interpolated median, 


where 


m ■= xo + b(zi — z 0 ) 
b = (i - Va)/pc ■ 


The literature is silent regarding the sampling properties of such medians, and regarding 
tests of significance appropriate to them. Let Pg and Pc be the population fractions 
below Zo , and between z 0 and z t , respectively, and let u and (3 be the population analogs 
of m and b obtained by replacing pg and pc in the above equations by P# and Pc > respec¬ 
tively. It is shown that m is asymptotically normally distributed about u so defined with 
asymptotic variance given by 


- P s ) - 2 ffP B Pc + (3 J P C (1 - Pc) I 

lY 1 n 


where 

p 

Y c — --— = ordinate of ‘central rectangle’ of ‘population histogram’. The 

Zi - Zo 

classical formula for the variance of a median can be obtained as the limit of the above 
when (zi — z 0 ) 0 with Pg -* i. 

In addition, tests of hypotheses regarding the value of the ‘interpolated median of the 
population’, u, and regarding the difference, u 2 — Mi , of the interpolated medians of two 
populations, are developed (1) by utilizing the above asymptotic results, and (2) by utiliz¬ 
ing the Neyman-Pearson likelihood-ratio-test approach. 


40. Some Efficient Range-Estimates of Variation. Nilan Norris, Hunter 
College, New York. 

The commonly used sample range (in the sense of the difference between the largest and 
smallest of the variates) is one of an unlimited number of range or difference-measures 
which can be used to scale parent populations For samples drawn from a Type III uni¬ 
verse, the maximum-likelihood estimate of dispersion is given by A — 0, where A is the 
sample arithmetic mean and G is the sample geometric mean. For samples drawn from a 
Type V universe, a 100% efficient estimate of absolute variation is given by G — H, where 
G is the sample geometric mean and H is the sample harmonic mean. Under certain general 
conditions usually fulfilled, the standard errors of both of these range-measures of absolute 
dispersion may be estimated from expressions obtained by application of the Laplace- 
Liapounoff theorem. The two parametric methods of estimating absolute variation as 
developed in this paper are likely to be most useful when the form of the parent universe 
is known, and it is either too expensive or impossible to obtain samples large enough to 
permit tho use of inefficient estimates. An example of Buch a case is the learning curve 
encountered in the analysis of frequency of occurrence of aircraft accidents by hours of 
flying experience of pilots in training. E. J. G. Pitman, Proc, Camb Phil, Soc,, Vol 33 
(1937), pp. 217-218, has discussed the scaling of the Type III distribution. The method 
of scaling given by Pitman differs from the method of estimation developed in this paper for 
the Type III universe. 



NEWS AND NOTICES 

Readers are tnviled to submit to the Secretary of the Institute news items of interest 

Personal Items 


Dr. Franz L. Alt has resigned his position with the Ballistic Research Labora¬ 
tories at Aberdeen to join the National Bureau of Standards where he is in charge 
of the Computation Laboratory of the “National Applied Mathematics 
Laboratory.” 

Dr. Edward W. Barankin has been promoted to Assistant Professor and Re¬ 
search Associate at the Statistical Laboratory, University of California, Berkeley, 
California. 

Dr. Stanley Clark has accepted an associate professorship of Education at 
the College of Education, University of Saskatchewan, Saskatoon, Canada. 

Dr. Gerald J. Cox has resigned his position as Research Chemist in the Chemi¬ 
cal Division of Corn Products Refining Co., Argo, Illinois to accept an appoint¬ 
ment as Professor of Dental Research in the School of Dentistry of the Uni¬ 
versity of Pittsburgh 

Mr. S. Lee Crump has resigned his assistant professorship at Iowa State Col¬ 
lege to accept a position in the Atomic Energy Project, University of Rochester. 

Dr JohnH. Curtiss, Chief of the National Applied Mathematics Laboratories 
of the National Bureau of Standards, has assumed temporary additional duties 
as Acting Chief of the Institute for Numerical Analysis. The Institute for 
Numerical Analysis, located on the U.C.L.A, campus, was established by the 
National Bureau of Standards with the support of the Office of Naval Research 
and the United States Air Force for the two-fold purpose of pursuing mathemati¬ 
cal research aimed at the development of numerical techniques for the full ex¬ 
ploitation of the newer large-scale electronic computing machines arid for per¬ 
forming numerical computations basic to the extension of the frontiers of science. 

Mr. Walter T. Federer has resigned his position at the .Statistical Laboratory 
at the Iowa State College to accept a position as Professor of Biological Sta¬ 
tistics in the Department of Plant Breeding at Cornell University. 

Dr John Gurland, who received his Ph.D. in mathematical statistics from the 
University of California in August, 1948, is now a Benjamin Pierce Instructor in 
Mathematics at Harvard University. 

Dr. Joseph L. Hodges, Jr. has been promoted to Instructor and Research As¬ 
sociate at the Statistical Laboratory, University of California, Berkeley. 

Dr Cyril J. Hoyt has resigned his position as Research Associate with the 
Department of Education at the University of Chicago to accept an appointment 

as Associate Director of the Bureau of Educational Research, University of 
Minnesota. 

Dr Tjalhng C. Koopmans has been promoted to Professor of Economics at 

144 



NEWS AND NOTICES 


145 


the University of Chicago and also Director of Research of the Cowles Com¬ 
mission for Research in Economics 

Dr Eugene Lukacs, formerly at Our Lady of Cincinnati College, has accepted 
a position as statistician at the United States Naval Ordnance Test Station at 
Inyokern, California 

Mr. Frank Jones Massey, Jr., who has been in the Department of Mathematics 
at the University of Maryland, has accepted an assistant professorship in the 
Department of Mathematics at the University of Oregon, Eugene, Oregon. 

Miss Judith Moss has resigned her position at the National Bureau of Eco¬ 
nomic Research and is now with the Port of New York Authority as an Eco¬ 
nomic Analyst m the Planning Bureau. 

Dr Richard Otter has accepted an assistant professorship in the Department 
of Mathematics at the University of Notre Dame. 

Dr. Nathan Grier Parke, III has been appointed Research Fellow of the 
Massachusetts General Hospital and Associated Research Director of the Har¬ 
vard Piatric Study. 

Dr. Joseph A. Pierce is now serving as Chairman of the Division of Natural 
Science and Mathematics at the Texas State University for Negroes, Houston 
4, Texas. 

Dr. Saul B. Sells, former Assistant to the President of the A. B. Frank Co. of 
San Antonio, Texas, has joined the staff of the Department of Psychology of the 
Air University, School of Aviation Medicine, Randolph Field, Texas. 

Dr. Otis A Pope, who was with the Office of Foreign Agricultural Relations, 
U. S. Department of Agriculture, Technical Collaboration Branch, Wash¬ 
ington, D. C., died September 28th, 1948. 


Special Summer Session in Survey Research Techniques 

The Survey Research Center of the University of Michigan will hold its special 
summer session m Survey Research Techniques from July 18 to August 13,1949. 

The following courses will be offered: Introduction to Survey Research, Survey 
Research Methods, Sampling Methods in Survey Research (introductory and 
advanced), Mathematics of Sampling, Statistical Methods in Survey Research, 
Techniques of Scaling. 

In addition the introductory courses will be given from June 20 to July 16. 
This will permit students who are attending the full eight-week summer session 
of the University (June 20 to August 13) to register for the introductory courses 
during the first four weeks. 

It is expected that this special session will attract men and women employed 
in market research or other statitical work and university instructors and gradu¬ 
ate students with a particular interest in this area of social science research. 

All courses are offered for graduate credit and students must be admitted by 



146 


NEWS AND NOTICES 


the Graduate School. Inquiries should be addressed to the Survey Research 
Center, University of Michigan, Ann Arbor, Michigan. 

I _ 


Summer Courses in Statistics at Michigan 

In addition to the special courses in Survey Research Techniques, the follow¬ 
ing courses of special interest to students of statistics are among those offered 
by the mathematics department of the University of Michigan in the. Summer 
Session, June 20 to August 13: Finite Differences (Fischer), Probability (Cope¬ 
land), Theory of Statistics I and II (Carver), Significance Tests (Dwyer), Com¬ 
putational Methods (Dwyer), Theory of Estimation and of Significance Tests 
(Craig) and Seminar (Craig). 


The International Congress of Mathematicians 


No summer meeting of the Institute of Mathematical Statistics is planned for 
1950 because of the meeting of the International Congress of Mathematicians 
which will be held in Cambridge, Massachusetts August 30 to September 6, 1950- 
The following statement has been prepared by the organizing committee: 

An International Congress of Mathematicians will be held in Cambridge, Mas¬ 
sachusetts, in 1950 under the auspices of the American Mathematical Society. 
The Society originally planned to act as host for a Congress in September, 1940, 
which was also scheduled to meet in Cambridge. At the 1936 Congress in Oslo, 
Norway, the invitation for the 1940 Congress was issued by the American dele¬ 
gation in the name of the American Mathematical Society. Plans for the 1940 
Congress were practically completed when the outbreak of World War II in 
September, 1939, made it necessary for the Society to postpone the Congress to 
a more favorable date. An Emergency Committee was established to carry on in 
the interim and, on recommendation of this Committee, the Council of the 
Society voted to hold the Congress in 1950. 


The 1950 Congress will be the third International Congress of Mathematicians 
to be held on the continent of North America. The first was held at North¬ 
western University in 1893 and the second at the University of Toronto in 1924. 
International Congresses were held at intervals of approximately four years! 
except when war intervened, until 1936. There has been no international gath¬ 
ering of mathematicians since that time and it is the sincere hope of the Or- 
gamzmg Committee that the gathering in 1950 will be a truly international one, 
that the American mathematicians will attend in large numbers, and that all 
other countries will be well represented, The Council of the American Mathe¬ 
matical society has voted unanimously to hold a Congress which will be open to 
mathematicians of all national and geographical groups. 

Time and Place. The dates for the Congress have been fixed as August 30- 
September 6, 1950. Harvard University will be the principal host institution. 
A number of other institutions in metropolitan Boston will join in the entertain¬ 
ment of Congress visitors by arranging special features on their campuses. 



NEWS AND NOTICES 


147 


Type of Congress, In recent years mathematicians have been much impressed 
by the success of the conference method for presenting recent research in fields 
where vigorous advances have just been made or are in progress. In view of the 
success of mathematical conferences on special topics which have been held in 
Russia, France and Switzerland and, more recently, at the Princeton Bicentennial 
Celebration, the 1950 Congress will include Conferences in several fields. For 
the 1940 Congress, Conferences in four fields had been planned. The number of 
Conferences was thus restricted lest the introduction of a promising and novel 
feature result in failure through the dissipation of interest and energy. 

Following the established custom, the Organizing Committee plans to have a 
number of invited hour addresses by outstanding mathematicians. In addition, 
sectional meetings for the presentation of contributed papers not included in 
Conference programs will be held m the following fields: I, Algebra and Theory 
of Numbers; II, Analysis; III, Geometry and Topology; IV, Probability and 
Statistics, Actuarial Science, Economics; V, Mathematical Physics and Applied 
Mathematics; VI, Logic and Philosophy, VII, History and Education. 

The official languages of the 1950 Congress will be English, French, German, 
Italian, and Russian. 

Organization. The plans for the Congress are under the supervision of an 
Organizing Committee which was elected by the Council of the American Mathe¬ 
matical Society in February, 1948. The Chairman is Professor Garrett BirkhofE 
of Harvard University and the Vice Chairman is Professor W. T. Martin of 
Massachusetts Institute of Technology. Other members of the committee are: 
Professors J L. Doob, G. C. Evans, J. R. Kline, Solomon Lefschetz, Saunders 
MacLane, Dean R G. D. Richardson, Professors Oswald Veblen, J. L, Walsh, 
D. V. Widder, Norbert Wiener, and R. L. Wilder. 

Many of the subventions promised for the 1940 Congress are still available. 
A Financial Committee under the chairmanship of Professor John von Neumann 
is endeavoring to secure additional funds. Besides support from Harvard Uni¬ 
versity and Massachusetts Institute of Technology, generous subventions have 
been subscribed for the Congress by the Carnegie Corporation, the Institute for 
Advanced Study, the National Research Council, and the Rockefeller 
Foundation. 

An Editorial Committee under the chairmanship of Professor Salomon Bochner 
will assume responsibility for the publication of the Proceedings of the Congress. 

Professor J. R Kline of the University of Pennsylvania has been napied Secre¬ 
tary of the Congress and Dr. R. P Boas, Executive Editor of Mathematical 
Reviews, has been designated Associate Secretary. 

Entertainment. Harvard University has offered the use of its dormitories and 
dining rooms for mathematicians and their guests for the period of the Congress. 
The Organizing Committee hopes that it will be possible to furnish room and 
board without charge to all mathematicians from outside continental North 
America who are members of the Congress. Congress membership fees and rates 
for room and board will be announced well in advance of the opening of the 
Congress. 



NEWS AND NOTICES 


148 

The Entertainment Committee, of which Professor L. H. Loomis of Harvard 
University is Chairman, is planning many interesting features, including a re¬ 
ception, garden party, symphony concert, and banquet, It is hoped that Amer- 
can mathematicians will be able to assist in the entertainment by putting their 
automobiles at the disposal of the Entertainment Committee for trips to be 
made out of Cambridge. 

Every effort will be made to facilitate the travel at reasonable cost of foreign 
participants while in the United States. Previous to the Congress, opportunity 
will be given them to see New York City under the guidance of some mathe¬ 
maticians. 

Information. Detailed information will be sent in due course to individual 
members of the American Mathematical Society and to foreign mathematical 
societies and academies. Others interested in receiving information may file 
their names in the office of the Society, and such persons will receive from time 
to time information regarding the program and arrangements. 

Communications should be addressed to the American Mathematical Society, 
531 West 116th Street, New York City 27, U. S, A. 


New Members 

The following persons have been elected to membership in the Institute 
(August 16, lg48 to November 30, 1948) 

Alman, John E., M.A. (Claremont Colleges) Instructor in Mathematics, College of Liberal 
Arts, Boston University, $16 Gardner Road, Brookline J,0 , Massachusetts. 

Andrian, Jane F., M S. (Western Reserve Univ.) Graduate student at University of Cali¬ 
fornia, 1388 C Ashby Avenue, Berkeley 8, California. 

Arbuckle, Richard A., B.S. (Baldwin Wallace College) Research-Industrial Pclloiv at 
I urdue University, F Ph.A. 630-3 Airport Road, Lafayette, Indiana. 

Barankln, Edward W., Ph.D. (Univ. of Calif ) Assistant Professor of Mathematics and 
Research Associate in Statistical Laboiatory, University of California, Berkeley 
California ‘ 1 

Blum, Julius R., Student in mathematical statistics at the University of California 1967 
Acton Street, Berkeley 2, California. ’ 

Bronfenbrenner Mrs. Jean, M.A. (Univ. of Chicago) Research Assistant, Cowles Com- 
miss'on, University of Chicago, Chicago 37, Illinois 

Bums, Loren V., B.S (Washburn College, Topeka, Kansas) Technical Director, MFA 
Milling Co., Box 1586 S.S.S., Springfield, Missouri. 

Clement, Edwin G., M.B.A (Univ. of Chicago) Captain, Chief of Management Control 
ton 20 ’ ^ m ^ quaHers > stl '»tegic Air Command, Andreas Air Force Base, Wouldng- 

Clramer, George F., Ph.D. (Univ, of Missouri) Mathematician, U. S Navy Department 
Vashington, D_C , 11% Quincy Street, Chevy Chase 15, Maryland. ' 

eg. w, Janies W., A _B (Umv of Chicago) Research Assistant, Psychometric Laboratory 
University of Chicago, M S East Slst Street, Chicago 37, Illinois J - 

Dodd, Stuart C., Ph.D. (Princeton) Research Professor of Sociology and Director of Public 
Opinion Laboratory, WM-lfith Avenue, N.E., Seattle 5, Washington 
onnelly, Tom G„ M A (Queen's Univ ) Graduate student at the University of North 
Caiohna, Room SIS “B”, Chapel Hill, North Carolina 



NEWS AND NOTICES 


149 


Edwards-Davies, Harold D , Special Lecturer, Department of Mathematics, Dalhousie 
University, 67 Seymout Street, Halifax, N.S., Canada. 

Ellner, Henry, Ch E. (College of City of New York) Statistician (Physical Sciences) 
1-C Oak Grove Drive, Baltimore SO, Maryland. 

Felgenbaum, Armand V., MS. (Mass Institute of Tech.) General Electric Company, 
Room 257, Building 23, Schenectady, New York 

Festlnger, Leon, Pli D (Univ. of Iowa) Assistant Professor of Psychology, Research Cen¬ 
ter for Group Dynamics, University of Michigan, Ann Arbor, Michigan. 

Frame, James S., Ph D (Harvard) Professor and Head of Department of Mathematics, 
Michigan State College, Lansing, Michigan. 

French, Benjamin J., M.Ed (Univ of New Hampshire) Examiner, Educational Testing 
Service, Matthews Road, Keene, Nev> Hampshire. 

Gaffey, William R., A B. (Univ, of Calif.) Research Assistant, University of California, 
£806 Grant. Street, Berkeley 4, California 

Goodman, Leo A., A B (Syracuse University) Research Assistant in Mathematical Sta¬ 
tistics and Graduate student at Princeton University, Fine Hall, Princeton Univer¬ 
sity, Princeton, New Jersey 

Hader, Robert J., Ph D (North Carolina State College) Instructor and Research As¬ 
sistant, Institute of Statistics, North Carolina State College, Raleigh, North Carolina.. 

Haley, Kenneth D,, M S (Stanford Univ ) Assistant Professor of Mathematics, Acadia 
University, Wolfville, Novia Scotia, Canada 

Kahn, Louis B., M S (Univ. of Wisconsin) Reseaich Associate, University of Wisconsin, 
Box 16-F, Badger, Wisconsin. 

Katz, Irving, B.S. (College of City of Now York) Statistician, Strategic Air Command, 
879—87 Place, S E., Washington 19, D. C. 

Klentzle, Mary J., Ph D. (Univ of Ill ) Assistant Professor of Psychology, Department of 
Psychology, Washington State College, Pullman, Washington 

Koditschek, Paul, LI. D (Univ, of Vienna) Research Associate, Scientific Research Serv¬ 
ice, Columbia University, 819 W. 18th Street, New York 14, New York. 

Levin, Howard S., S B. (Univ. of Chicago) Electronic Engineer, Glenn L. Martin Co., 
638 Addison Street, Chicago IS, Illinois 

Levine, George J., B.S. (Brooklyn College) Actuarial Mathematician, 6109 — 1st Street, 
North, Arlington, Virginia. 

Llverman, J. G., B.A. (Cantab) Civil Servant, Ministry of Fuel and Power, 81 Ascot Court, 
Grove End Road, London, iV.lF. 8, England 

Loeve, Michel, Ph.D. (Sorbonne, Paris) Professor and Research Associate m Statistical 
Laboratory, Durant Hall, University of California, Berkeley, California 

Loo, Ching-Tsu, Ph.D. (Univ of Chicago) Research Associate, Statistical Laboratory, 
University of California, Berkeley, California 

Lubln, Ardie, B S. (Umv. of Chicago) Statistician, Psychology Department, Maudsley 
Hospital, Denmark Hill, S.E. 5, London, England. 

Moses, Lincoln E., A.B. (Standord Univ.) 7 Perry Lane, Menlo Park, California. 

Mourier, Edith, Lirencio-da-scicnccs (Univ, of Caen, France) Teaching Assistant, Statisti¬ 
cal Laboratory, University of California, Berkeley, California. 

Osborne, Ernest L., LLD (LaSalle Umv ) Economic Analyst, Department of the Army, 
Chancery Apartments, 8180 Wisconsin Avenue, N.W., Washington 16, D. C. 

Pabst, William R. Jr., Ph.D (Columbia Univ.) Quality Control Division, Bureau of Ord¬ 
nance, Navy Department, 8480 Quebec Street, N.W., Washington 16, D. C. 

Plackett, Robin L., M A (Cambridge, England) Lecturer in Mathematical Statistics, 
Department of Applied Mathematics, The University, Liverpool 3, England. 

Proschan, Frank, M.A, (George Washington Umv.) Research Analyst, 1627 R. St , N.W., 
Washington 9, D. C 



150 


REVISION OF BY-LAWS 


Rau, A. Ananthapadmanabha, M S. (Iowa State College) Statistician and Agricultural 
Meteorologist, Department of Agriculture, Bangalore, Mvsore State, India 

Rees, Mina, PhD, (Univ of Chicago) Head, Mathematics Branch, Office of Naval Re¬ 
search, R27I9, T-3 Building, Washington 25, D. C. 

Roberts, Spencer W. Jr., M,S. (Univ. of Michigan) Research Associate, University of 
Michigan Department of Engineering Research, SOS Thompson Street, Ann Arbor, 
Michigan 

Sanaa, S. C., M So (Calcutta Univ) Graduate student in mathematical statistics at 
Columbia University, 1120 John Jay Hall, Columbia University, A Via York Hi, 
Hew York 

Schnelderman, Marvin A., B.S (College of City of New York) Statistician, Biological, Na¬ 
tional Institute of Health, T-6, 2215, Bethesda, Maryland. 

Schull, William J., Ph.D, (Ohio State Univ.) Student at Ohio State University, Depart¬ 
ment of Zoology, Ohio State University, Columbia 10, Ohio. 

Schweld, Samuel, B.S S. (College of City of New York) Statistician, Industry Division, 
Bureau of the Census, 1U0 Monioe Street, AMP., Washington 10, D. C. 

Wallace, David L., B.S. (Carnegie Institute of Tech.) Graduate Student and Teaching 
Assistant m Mathematics, Carnegie Institute of Technology, 123 Lawrcnco vtrcnua, 
Homestead Park, Pennsylvania, 

Williams, Evan James, B.C. (Univ. of Tasmania) Research Officer, Section of Mathe¬ 
matical Statistics, Division of Forest Products, C S.I.R., P 0. Box 18, South Mel¬ 
bourne, S.C 4, Australia 

Zavrotsky, Andres, Head of the Statistical Department of the Venezuela Office for Social 
Insurance, Mercedes a Luneta S9, Caracas, 

Correction of New Members In June, 1948 Issue: 

Lolzeller, Enrique Blanco, should be written as follows: 

Blanco Lolzeller, Enrique. (PhD.) Professor of Statistics, Economics Faoulty, Madrid 
University, Spain, Nenion No. 1,, Madrid, Spain. 


ELECTION OF OFFICERS AND COUNCIL AND REVISION OF BY-LAWS 

At the membership meeting held at Cleveland cm December 28, the following 
officers and members of the Council were elected: 

President: 

President-Elect: 

Council; 

3-year term 


2-year term < 


1-year term < 


J. Neyman 
J. L. Doob 
fW. G. Cochran 
C. Eisenhart 
H. Hotelling 
A. Wald 
W. Feller 
P. G. Hoel 
H. Schefffi 
J. Wolfowitz 
Gertrude Cox 
M. A. Girshick 
J W. Tukey 
|J- von Neumann 



REPORT ON SEATTLE MEETING 


151 


The By-Laws were also revised and further action was taken. More detailed 
accounts of this meeting will be sent directly to the members. 

Paul S. Dwyer 
Secretary 


REPORT ON THE SEATTLE MEETING OF THE INSTITUTE 

The thirty-sixth meeting and fourth Regional West Coast meeting of the 
Institute of Mathematical Statistics was held in Seattle, Washington, November 
26-27, 1948. The sessions of November 27, 1948 were held jointly with the 
Biometric Society (Western N. A, Region). The meeting was attended by 91 
persons, including the following 22 members of the Institute: 

F. C Andrews, E W Barankin, Z. W. Birnbaum, A. H. Bowker, D G. Chapman, R. C. 
Davis, W. J Dixon, E. Fay, M, A. Girehick, P. Horst, H. M. Hughes, J. C. R, Li, F. Massey, 
J. Neyman, E Paulson, Elizabeth L Scott, Esther Seidcn, M. Sobel, Z. Szatrowski, J. It. 
Vatnsdal, J E. Walsh and Zivia S. Wurtele. 

At the morning session on November 26, Professor R. M. Winger of the Uni¬ 
versity of Washington as chairman welcomed those attending the meetings, and 
the following program of contributed papers was presented: 

1, Estimation of the Variance of the Bivariate Normal Distribution. 

Harry M Hughes, University of California 

2. Derivation of a Broad Glass of Consistent Estimates 

It C Davis, NOTS, Inyokorn, California. 

3 Locally Best Unbiased Estimates. 

Edward W. Barankin, University of California, 

4 Some Problems Related to the Distribution of a Random Number of Random Variables. 

Edward Paulson, University of Washington. 

6 Asymptotic Expansions for the Distribution of Certain Likelihood Ratio Statistics. 

Albert H. Bowker, Stanford University 

6. On a Problem of Confounding in Symmetrical Factorial Design. 

Esther Seiden, University of California. 

7. Some Bounded Significance Level Tests of Whether the Largest Observations of a Set are 

Too Small. 

John E, Walsh, Project RAND, Douglas Aircraft Corp., Santa Monica, Calif 

The afternoon session of November 26, under the chairmanship of Professor 
J. Neyman of the University of California at Berkeley, had the following 
program: 

1. Invited paper: 

Multiple Decision Functions. 

M, A. Girehick, Stanford University. 

Contributed papers: 

2. Determination of Optimal Test Length to Maximize the Multiple Correlation Coefficient. 

Paul Horst, University of Washington. 

3. Some Numerical Comparisons of a Non-Paramelnc Test with Other Tests. 

F, J. Massey, University of Oregon, 



152 


REPORT ON CLEVELAND MEETING 


4, On the Deviation of Exheme lvalues 
W J Dixon, University of Oregon 

o The Optimum Size of Internal for Making Measurements of a Rocket's Angular Velocity. 

Edward A Fay, Univeisity of California. 

6. Staliovaiy Time Series Anali/sis and Common Stock Price Forecasting. 

Zenon Szatrowski, University of Oregon. 

At the morning session of November 27, with Professor W. F. Thompson of the 
University of Washington as chairman, the program consisted of the following 
papers: 


1 Invited paper 1 

On the Place of Statistics in Fishery Biology 

Willis S Rich, Stanford University and U S Fish and Wildlife Service. 

Contributed papers, 

2, Distribution of the Number of Schools of Fish Caught per Boat. 

J Neyman, University of California. 

3 Some Problems in Fishery Research to which Statistical Methods are App licable. 

Ralph Silliman, U. S. Fish and Wildlife Service, Seattle, Washington. 

4. The Application of the Hypergeometnc Distribution to Problems of Estimating and Com¬ 
paring Zoological Population Sizes. 

Douglas Chapman, University of California. 

5 Extension to Multivariate Case of Neyman's Smooth Test. 

Elizabeth L Scott, University of California. 

6. A Mathematical Theory of Vitamin A Metabolism, in Fish. 

Norman E Cooke, Pacific Fisheries Experimental Station, Vancouver, B.C. 

The afternoon session of November 27 was held under the chairmanship of 
Professor F. W. Weymouth of Stanford University, with the following program: 

1. Invited paper. 

Statistical Problem of Enumeration of Fish Eggs m the Sea. 

Oscar E. Sette, U, S. Fish and Wildlife Service, San Francisco. 

Contributed papers 

2. The Inieraclance Hypothesis 

Stuart C. Dodd, University of Washington, 

3 The Employment of Marked Members in Estimation of Animal Populations. 

Milner E. Schaefer, Stanford University 

4 Non-Response and Repeated Call-Backs in Opinion Polls. 

Z W Birnbaum, University of Washington, 

5 Statistical Pi oblems Relating to Fisheries. 

J L. Hart, Pacific Biological Station, Nanaimo, B. C. 

, November 20, at 0:30 o’clock there was a dinner for members and guests 
at the Edmond Meany Hotel, 

Z. W. Birnbaum 


REPORT ON THE CLEVELAND MEETING OF THE INSTITUTE 

h P ]d h !t E Ir e ?wt nn u 1 Me °^ ng 0f the Institute of Mathematical Statistics was 
at the Statler Hotel, Cleveland, Ohio, on December 27-30, 1948. The 



REPORT ON CLEVELAND MEETING 


153 


meeting was held in conjunction with the Annual Meeting of the American 
Statistical Association. The following 176 members of the Institute were in 
attendance: 

P. H, Anderson, R L. Anderson, L. W. Anderson, Max Astrachan, G J. Auner, T A. 
Bancroft, B. Geoffrey, Z, W. Birnbaum, Archie Blake, E, E. Blanche, C I. Bliss, Dorothy 
S. Brady, A E. Brandt, G. W. Brown, T. II. Brown, M. A Brumbaugh, P. T. Bruybro, It. W. 
Burgess, I.W. Burr, J. M. Cameron, A. G Carlton, Harry Carver, P. R. Celia, Uttam Cliand, 
R A. Chapman, Edmund Churchill, Herman Chernoff, W. G. Cochran, Jerome Cornfield, 
J. H. Cover, Gertrude M. Cox, C. C Craig, S. L, Crump, J. H Curtiss, D. A. Darling, W. 
L. Deemer, D. B DeLury, W E. Demmg, Philip Desind, II. F. Dorn, C. W, Dunnett, P. S. 
Dwyer, Churchill Eisenhart, Benjamin Epstein, C.D. Ferns, Leon Festinger, C. II. Fischer, 
J. C Flanagan, M M. Flood, L R. Frankel, D A. S. Fraser, II. A. Freeman, Milton Fried¬ 
man, II. C Fryer, E. F. Gardner, R. S Gardner, II. II. Germond, William Gomberg, E. L. 
Green, S. W. Greenhouse, J. Gurland, R, J Hader, K. W. Halbert, H J. Hand, M. H. Han¬ 
sen, T. E. Harris, Boyd Harshbarger, P. M. Houser, J F Hofmann, Harold Hotelling, A. S. 
Householder, E E Houseman, Helen M Humes, C. C. Hurd, C. M Jaeger, R. J. Jessen, 
H. L Jones, Irving Katz, Leo Katz, Harriet J. Kelly, 0. Kempthorne, A. W. Kimball, Jr., 
A. J. King, Leslie Kish, L. A Knowler, Lila F. Knudsen, C. F. Ifossack, O. E. Lancaster, 
Marvin Lavin, S. B Littauer, Irving Lorge, F. W. Lott, Jr,, Eugene Lukacs, P, J. McCar¬ 
thy, C. J. Maloney, John Mnndel, Nathan Mantel, II. B. Mann, E. S. Marks, Margaret 
Merrell, Helen Michaels, E B. Mode, A M Mood, Nathan Morrison, Dorothy J. Morrow, 
J. W. Morse, J. E. Morton, Jack Moshman, Frederick Mostellor, B. D. Mudgett, Hugo 
Muench, M R. Neifeld, R. H. Noel, G. E. Noether, J, I, Northern, H. W Norton, J. A. 
Norton, Jr , E. G, Olds, P. S. Olmstead, Bernard Oatle, A. E. Pauli, Paul Peach, M. P. 
Peisakoff, E W. Pike, E. J. G Pitman, R. A. Porter, J. A. Rafferty, L. J. Reod, Olav Reier- 
boI, William Reitz, F. D. Rigby, A, C Rosander, Herman Rubin, Erik Ruist, P. J. Rulon, 
Max Sasuly, F. E. Satterthwaite, L, J. Savage, Mary Ann Savas, Marvin Schniederman, 
Elizabeth Scott, G. R. Seth, Jack Sherman, S. S. Shnkhande, C, R. Simms, J. II Smith, 
G. W. Snedecor, Mortimer Spiegelman, B. R. Stauber, F. F. Stephan, Joseph Steinberg, 
J, V. Sturtevant, B. J. Tepping, W. R. Thompson, J W Tukey, Jan Vchytil, W. R. Van 
Voorhis, D. F, Votaw, Jr , F, M. Wadley, Helen M. Walker, D. L. Wallace, W. A. Wallis, 
G. S. Watson, Leonel Weiss, Samuel Weiss, E L Welker, M. E. Wescott, Phillips Whidder, 
D.R. Whitney, S.S Wilks, C P.Wmsor, Gerald Winston, M A Woodbury, T. D. Woolscy, 
Holbrook Working, W. J. Youden 

The first session, a joint session with the American Statistical Association, 
was held at 2:00 P.M. on Monday, December 27, at which time a paper entitled 
Statistical Concepts in an Infinite Number of Dimensions was presented by Pro¬ 
fessor David H Blackwell of Howard University, Professor E. J. G. Pitman 
of the University of Tasmania was chairman. 

The second session of the opening day was devoted to contributed papers in 
mathematical statistics, and was held at 4:00 P.M. in conjunction with the 
American Statistical Association. Professor W. R. Van Voorhis of Penn College 
was chairman. The following papers were presented: 

1, A Necessary Condition for a Certain Class of Characteristic Functions. Preliminary 
report. Eugene Lukacs, NOTS, Inyokern, California and Our Lady of Cincinnati 
College, Cincinnati, Ohio 

2. Precision of Estimates from Samples Selected under Marginal Resh ictions. Preliminary 
report. Clifford J. Maloney, Research and Development Department, Camp Det¬ 
rick, Frederick, Maryland, 



154 


REPORT ON CLEVELAND MEETING 


3. Properties of Maximum and Quasi-Maxunum Likelihood Estimates of Parameters of a 
System of Linear Stochastic Difference Equations with Serially Correlated Disturbances, 
Preliminary report Herman Rubin, Cowles Commission, University of Chicago, 

4 The Computation of Maximum Likelihood Estimates of Parameters of a System of Linear 
Stochastic Difference Equations with Serially Correlated Disturbances. 

Herman Chernoff, Cowles Commission, University of Chicago 
5. Test Criteria for Hypotheses of Symmetry and Definiteness of a Regression Matrix for 
Demand Functions. 

Uttam Chand, University of North Carolina. 

6 The Distribution of Extreme Values in Samples whose Members are Stochastically De¬ 
pendent. 

Benjamin Epstein, Wayne University. 

A session on Teaching Statistical Quality Control was held on Monday evening, 
December 27, jointly with the Ohio Section of the American Society for Quality 
Control and Section on Training of Statisticians of the American Statistical 
Association. Professor Samuel S. Wilks of Princeton University presided at the 
session. The following two papers were presented.: 

1. Teaching Statistical Quality Control for Town and Gown. 

Lloyd A Knowler, Stato University of Iowa. 

2 Instructional Aids for Statistical Quality Control. 

Edwin G Olds, Carnegie Institute of Technology. 

The session concluded with discussion by Professor Irving W. Burr of Purdue 
University, and Professor Theodore H. Brown of Harvard University. 

A session on Review of Statistical Methodology was held jointly with the Araeri- 
can Statistical Association at 2:00 P.M., December 28. Professor Frederick 
Mosteller of Harvard University presided. The following papers were presented: 

1, Surveys and Sampling. 

Philip J McCarthy, Cornell University. 

2 , Industrial Applications 

Paul S Olmstead, Bell Telephone Laboratories. 

3, Biology, Physical Sciences and Experimental Design 
W. J. Youden, National Bureau of Standards, 

At 4:00 P.M. on Tuesday, December 28, Professor H, C. Fryer of Kansas 
State College presided at a joint session with the Biometric Society and Bio¬ 
metrics Section of the American Statistical Association. Papers presented were: 

1. Evaluation of Field Insecticides from Count of Survivors. 

C I Bliss and Neely Turner, Connecticut Agricultural Experiment Station. 

2. Curved Dosage-Response Curves. 

Oscar Kempthorne, Iowa State College, 

3. Statistical Variations in Contents of Dry-Filled Ampuls in Current Pharmaceutical 
Practice 

M, W. Green, American Pharmaceutical Association, and Lila F. Knudseu, Food and 
Drug Administration, 

4. A Practical Method for Determining the Mean and Standard Deviation of Truncated 
Normal Distributions. 

J Ipsen, Yale University. 



REPORT ON CLEVELAND MEETING 


155 


The session was concluded with discussion by D. B. DeLury, Ontario Itesearch 
Foundation; Lloyd Miller, Sterlmg-Winthrop Research Institute; C. Eisenhart, 
National Bureau of Standards; J. L. Northam, Kansas State College. 

On Wednesday, December 29, at 2:00 P.M., Dr. W. Edwards Deming presided 
at a session on Effects of Error m the Independent Variate in Regression Problems. 
This meeting was held in conjunction with the Biometric Society and Biometric 
Section of the American Statistical Association. Papers presented were: 

I. Are There Two Regressions? 

Joseph Borkson, Mayo Clinic, 

2 Present Status of the Theory. 

Jerzy Neyman, University of California. 

3. The Idenlifiability of a Linear Relationship Between Variables which are Subject to 
Error 

Olav Reiersol, Purdue University. 

These papers were followed by discussion by Professor Churchill Eisenhart, Na¬ 
tional Bureau of Standards, Elizabeth L. Scott, University of California, and 
C. P. Winsor, Johns Hopkins University. 

Professor Boyd Harshbarger, of the Virginia Polytechnic Institute, presided 
at the Wednesday afternoon session on contributed papers m mathematical sta¬ 
tistics. Papers presented were: 

1. On Age-Dependent Stochastic Branching Processes. 

Richard Bellman and Theodore E Harris, Stanford University, Palo Alto, Cali¬ 
fornia and the Rand Corporation, Santa Monica, California. 

2. Cuboidal Lattices. 

G. S. Watson, Institute of Statistics, University of North Carolina. 

3. Transformations Induced by Scries Approximation of Prior Probability Amplitude. 
Archie Blake, Office of tho Surgeon General, U. S. Army. 

4 On the Utilization of Market Specimens in Estimating Populations of Flying Insects. 
Cecil C. Craig, University of Michigan. 

6. On a Probability Distribution. 

Max A Woodbury, University of Michigan. 

6 Distribution-Free Tests of Data from Factorial Experiments. 

G W. Brown and A. M. Mood, Iowa State College. 

7. On Sums of Symmetrically Truncated Normal Random Variables. 

Fred C. Andrews and Z W, Birnbaum, University of Washington. 

8. On the Foundation of Statistics. 

(By titlo). Max A. Woodbury, University of Michigan, 

9. Finitely Additive Probability Functions. 

(By title). Max A. Woodbury, University of Michigan. 

10. On Inverting a Matrix via the Gram-Schmidt Orlhagonalizalion Process. 

(By title). Max A, Woodbury, University of Michigan, 

II, Certain Properties of the Mulhparametcr Unbiased Estimates. Preliminary report. 
(By title). Gobind R. Seth, IoWa State College, 

12 A Class of Lower Bounds for the Variance of Point Estimates. 

(By title). Douglas Chapman, University of California. 

13. Standard Errors and Tests of Significance for Interpolated Medians. 

(By title), Churchill Eisenhart and Miriam L. Yovick, National Bureau of Stand¬ 
ards. 



156 


REPORT OF THE PRESIDENT 


A symposium on Randomness and its Testing occupied tlie 4:00 PAI. session 
on Wednesday. Dr. Walter A. Shewlmrt of the Bell Telephone Laboratories 
presided and the following papers were presented: 

1, Survey of Available Tests for Randomness, 

W. Allen Wallia, University of Chicago, 

2, Power Functions of Tests foi Randomness. 

IT B. Mann, Ohio State University. 

3, Power Functions of Nan-Parametric Tests. 

Ransom Whitney, Ohio State Umvoisity. 

Discussion was led by Bernice Brown, The Rand Corporation; Paul S. Olmstead, 
Bell Telephone Laboratories; E. J. G. Pitman, University of Tasmania. 

The morning session on Thursday, December 30, was a joint session with the 
American Statistical Association, with Professor Jerzy Neyman of the University 
of California presiding The following two papers were presented upon invita¬ 
tion of the Institute: 

1. Estimating Linear Restrictions on Regression Coefficients for Multivariate Nornw! 

Distributions. 

T. W. Anderson, Columbia University. 

2. Some Aspects of the Theory of Testing Composite Hypotheses. 

E L Lehmann, University of California 

The Business Meeting was held at 10:00 A.M. on Tuesday, December 28, 
Dr Churchill Eisenhart presided. A report of this meeting is found elsewhere 
in this issue. 

W. R, Van Vooiuus 
Assistant Secretary 


REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1948 

The last few years have seen a considerable growth of the Institute, The 
upward trend has continued throughout 1948. The Institute has acquired 126 
new members during the year, but this gam is to be balanced against losses due 
to resignation and suspension for non-payment of dues. The Institute starts 
“i 94 ? mth a membership of about 1,100 as against the membership of 
1,037 at the beginning of 1948. While the net gam is still substantial, it is not 
quite as much as hoped for, and this may serve as an incentive for an increased 
membership drive in 1949, The constantly Increasing interest and research 
activities m statistical theory and methodology are well reflected in our meetings 
and the publications appearing in the Annals, 

Meetings. The growth of the Institute in the past few years has brought 
about a considerable increase in its various activities. This manifested itself 

vP^r y r n ^,fT k aad nch P rc *™ of the meetings held during the 
year 1948. In addition to the usual invited addresses and contributed papers, 
the programs included a considerable number of symposia on various important 



REPORT OP THE PRESIDENT 


157 


subjects such as the theory of games (Berkeley, June; Madison, September), 
stochastic difference equations (Madison, September), scales of measurement 
(New York, April), sampling for industrial use (Berkeley, June), etc. The 
eleventh summer meeting was held in conjunction with the meetings of the 
American Mathematical Society and the Econometric Society (Madison, Septem¬ 
ber), The eleventh annual meeting (Cleveland, December) was held in con¬ 
junction with the American Statistical Association, Econometric Society and 
Biometric Society. There were also three regional meetings: New York (April), 
Berkeley (June) and Seattle (November). The Berkeley meeting was held 
in conjunction with the Pacific Division of the American Association for the 
Advancement of Science and some of the sessions of the Seattle meeting were 
sponsored jointly with the Biometric Society. 

To facilitate the organization of meetings and arrangements of programs, 
instead of a single program committee there were three program committees 
appointed, one for Eastern, one for Mid-Western and one for Ear-Western meet¬ 
ings. These committees consisted of the following members. Eastern Com¬ 
mittee: W. G. Cochran, C. Eisenhart (Chairman), F. Mosteller, and J Wolfo- 
witz, Mid-Western Committee: C. C. Craig, H. B Mann, and A. M. Mood 
(Chairman); Far Western Committee: Z. W. Birnbaum, M. A Girshick, P. G. 
Iioel, and J. Neyman (Chairman). To coordinate the work of these three pro¬ 
gram committees, a coordinating committee was appointed consisting of J. W. 
Tukey (Chairman) and the three chairman of the three program committees. 
This committee was also charged with the responsibility of making recommenda¬ 
tions to the Board of Directors as to times and places for future meetings, 
Another innovation introduced during the past year was the appointment of 
assistant secretaries in connection with the meetings. S. B. Littauer acted as 
assistant secretary for the New York meeting, IC. J. Arnold for the summer 
meeting in Madison, Z. W. Birnbaum for the Seattle meeting and W. R. Van 
Vooihis for the Cleveland meeting The assistant secretaries were charged with 
the task of looking after the local arrangements that had to be made in connec¬ 
tion with the meetings. The appointment of assistant secretaries proved to 
be a great success not only in facilitating the necessary local arrangements for 
meetings but also m relieving the burden on the secretary’s office. On the basis 
of this year’s experience, it seems very desirable to continue with this practice 
in the future. 

No Rietz Memorial lecture was given in 1948 in accordance with a decision 
of the Board of Directors that these lectures should not be given every year. 
It is planned, however, to have a Rietz lecture for 1949 and the Board of Direc¬ 
tors invited J. Neyman to deliver it. 

The New Constitution. One of the major events of the year was the adoption 
of the new constitution at the meeting in Madison. The growth of the Institute 
in recent years made parts of the old constitution obsolete and the need for a re¬ 
vision was apparent. Our thanks are due to the Committee on Planning and 
Development which has devoted much time and consideration to the study of 



158 


REPORT OP THE PRESIDENT 


the problem and prepared a draft of a revised constitution. M. H. Hansen was 
chairman of this Committee. Other members were: J. H. Curtiss, W. G. 
Cochran, W. Feller, J. Neyman, H. W. Norton, F. F. Stephan, J. W. Tukey, 
and W. A Wallis. A draft of the new By-Laws was prepared by J. IV. Tukey, 
who acted as a subcommittee of the Committee on Planning and Development. 

Annals. The growth of the Institute during the past feu' years has mani¬ 
fested itself also in a constantly increasing number of manuscripts submitted for 
publication in the Annals While it is very gratifying to see this upward trend, 
it raises some problems of financial nature. At the rate manuscripts are com¬ 
ing in, an expansion of the publication facilities of the Institute would seem 
very desirable. Increase of the volume of the Annals would, however, mean 
increased cost and the present financial situation of the Institute could not 
allow such an additional burden unless some new sources of income can be found. 
Apart from a possible increase in the cost of printing the Annals, it seems that 
additional expenditures will be necessary for secretarial help in 1949. It was 
decided at the membership meeting in Madison that additional funds be raised 
through the contributions of universities and other organizations with strong 
interest in mathematical statistics and through the contributions of the members. 
Appeals for such contributions were sent out and it is hoped that there will be a 
generous response. 

The new constitution permits the appointment of responsible Associate Edi- 
tors. This brings up the whole question of editorial set-up and policies. A 
committee with S. S Wilks as chairman was appointed to make a thorough study 
of the Institute’s publication experience and to make recommendations as to 
publication policies and editorial set-up. Other members of this committee are: 
W G. Cochran, W. Feller, M. A. Girshick, J. Neyman, P. S. Olmstead, W. A. 
Walhs and J Wolfowitz. The committee gave much thought and considera¬ 
tion to the problems involved and will report to the newly elected officers and 
Council. 

The Annals has developed under the leadership of the Editor, S. S. Wilks, 
to one of the outstanding professional journals. I am sure that I can speak for 
all our members in expressing the Institute’s indebtedness to S. S. Wilks for his 
untiring and most successful work. 


Committees. The problem of classification of statisticians in the Government 
service is naturally of considerable importance to the statistical profession, A 
committee consisting of W. E. Deming (chairman) and C, Eisonhart was ap- 
pointed to make a thorough study of tins question with a view to advising the 
Civil Service Commission. The committee prepared a report in which three 
main categories of statisticians in Government Service are distinguished: mathe¬ 
matical statisticians, statistical analysts and data-collecting statisticians. The 
report, was transmitted to the Civil Service Commission with the approval of the 
Board of Directors. The members of this committee are to be commended for 

Wf 7 7 n 7 hey ha T e d0nc in s P ite of severe limitation of time al¬ 
lotted by the Civil Service Commission. The work on the problem of classifica- 



REPORT OF THE PRESIDENT 


159 


tion of statisticians still goes on and a committee of experts consisting of mem¬ 
bers of the Washington Statistical Society, the Institute of Mathematical Sta¬ 
tistics, and the American Statistical Association has been set up to advise the 
Civil Service Commission on this problem. Our representatives on this com¬ 
mittee of experts are: W. E. Deming, C. Eisenhart, M. H. Hansen and S. Weiss. 

The advances m numerical computations in lecent years has made an enlarge¬ 
ment and reorganization of the Committee on Tabulation necessary. Its present 
members are: R. L. Anderson, C. Eisenhart (Chairman), A. M. Mood, F. 
Mosteller, H. G Romig, L E. Simon, and J. W. Tukey. The objectives of thiB 
committee, as outlined by the chairman are: (1) to prepare a comprehensive 
list of new mathematical tables that would be of value in statistical theory and 
applications, (2) to assemble an American Collection of "Tables for Statisti¬ 
cians”, (3) to prepare a list of mathematical tables of importance in statistical 
theory and applications to be recommended for inclusion in the proposed Na¬ 
tional Bureau of Standards volume of "Tables for the Occasional Computer”. 
To implement the program of the committee, the following sub-committees have 
been constituted: (1) “On Computing Centers” with L. E. Simon as Chairman, 
(2) “On Ranks and Runs” with A. M Mood as Chairman, (3) "On Serial Cor¬ 
relations" with R. L. Anderson as Chairman, (4) “On 2x2 Tables” with C. 
Eisenhart as Chairman, (5) “On Order Statistics" with F. Mosteller as Chair¬ 
man, (6) “On Binomial, Poisson, and Hypergeometric Distributions” with 
H. G. Romig as Chairman, (7) “On Miscellaneous Tables” with J. W. Tukey 
as Chairman. 

On the recommendation of the membership committee, consisting of H. 
SchefR; (chairman), C. C. Craig, P. G. Hoel and F. F. Stephan, the following 
members have been elected as Fellows- J. Berkson, E. L. Lehmann, E. J G. 
Pitman, H. E. Robbins and C. M. Stem. The members of the finance com¬ 
mittee for 1948 were P. S. Dwyer (chairman), C F. Roos, L. A. Knowles and 
T. N. E Greville. 

The Nominating Committee for 1948 consisted of W. Bartky (chairman), 
C. C. Craig, J. F. Daly, H. A. Freeman, E L. Lehmann and W. G. Madow. The 
committee nominated J. Neyman for President, J. L. Dobb for President-Elect 
and 24 Council members for the 12 positions to be filled. In accordance with 
the provisions of the new constitution, the Nominating Committee for 1949 has 
also been appointed. The members of this Committee are: W. G. Cochran 
(Chairman), M. H. Hansen, H. B, Mann, A M. Mood and II. G. Romig. 

The Board of Directors has been exploring the possibilities for a closer co¬ 
operation with our colleagues abroad and for making foreign statistical publica¬ 
tions more easily accessible to our members. In particular, there has been 
correspondance with Professor E. S. Pearson, Managing Editor of Biomelrika, 
on the question of a possible reduction of the subscription rate of Biomelrika 
for our members. As a result of these discussions, Professor Pearson offered 
certain reductions, provided that a sufficient number of subscribers can be se¬ 
cured. Detailed information on this was contained in a memorandum of the 



160 


REPORT Of 1 THIS SECRISTAUWItEAsI XihH 


Secretary, P. S. Dwyer, in the November mailing to the membership. II is 
hoped that many of our members will make use of this opportunity. 

With the. new constitutions of the American Statistical A&somtirm and the 
Institute of Mathematical Statistics adopted, the* way is cleared for the considera¬ 
tion of possible federation plans of the various statistical organizations by the 
Inter-Society Committee on Federation. J. H. Curtiss anti P. S. () bn Mend con¬ 
tinued to serve as our representatives on the aforementioned committee during 
1948. W Feller was our representative on the Policy Committee for Mathe¬ 
matics, and F. C. Mosteller and S. S. Wilks represented the Institute on the 
Joint Committee for the Development of Statistical Application in iMigin&ering 
and Manufacturing. W. Bartky was reappointed for a three-year term as our 
representative to the Division of the Physical Sciences of the Nat ional Research 
Council, and H. Hotelling was our representative to the American Association 
for the Advancement of Science. 

In conclusion, I wish to thank all committee members and others who par¬ 
ticipated in the work of the Institute during the past year. The heaviest burden 
falls, of course, on the Secretary and it is hard to express adequately our ap¬ 
preciation. for his unselfish efforts and devotion. The smooth and efficient con¬ 
duct of the affairs of the Institute is largely due to his work. 

Abraham Ward 
President, tll'tS 

December 31, 1948 

REPORT OF THE SECRETARY-TREASURER OF THE INSTITUTE 

FOR 1948 1 

At the beginning of 1948 the Institute had 1037 members and during the 
period covered by this report 120 new members (13 of whom begin (heir mem¬ 
bership with 1949) joined the Institute and two member's were re-instated. 
During 1948 the Institute lost 64 members of which 24 were by resignation, 38 
by suspension for non-payment of dues and 2 by death. Judging from the 
information available at this date, the Institute will have 1101 members as it 
starts 1949 

Deceased during the year were Dr. Otis A. Pope and H. M. Tompkins. 

Meetings of the Institute held during 1948 included those at Columbia Uni¬ 
versity on April 14-15, at the Berkeley campus of the University of California 
on June 22-24, at the University of Wisconsin on .September Cl—10, at the Uni¬ 
versity of Washington on November 20-27, and at Cleveland cm December 
26-30 The Secretary wishes to call attention to the excellent work of the 
members who served as assistant secretaries at these meetings: Professor 
Littauer. at New York, Professor Arnold at Madison, Professor Iiirnbaum at 
Seattle and Professor Van Voorhis at Cleveland, 

1 This report covers the period January 1, 194 lS to December 20, 1018 as the books were 
closed on December 20,1948 so that the report could be made at Llic annual mooting. 



REPORT OP THE SECRETARY-TREASURER 


161 


A summary of the financial transactions of the Institute is given in 
the Financial Statement far 19J t 8 which follows: 

FINANCIAL STATEMENT 
December 31, 1917 to December 20, 1948 

A RECEIPTS 

Balance on IIand, s December 31, 1947 . $5,858.37 

Dues ... . . ... . 7,482.21 

Contributions . , 255,50 

Subscriptions . 3,660 40 

Rale of Back Numbeis . , . 2,718 27 

Income from Investments , . 100 00 

Advertising , . 160,00 

Miscellaneous . , 57 24 


Total ... . . $20,291.99 

B. EXPENDITURES 

Annals—Current 

Office of the Editor . . . . . $175.00 

Waverly Press .... . 7,824 66 $7,999.66 


Annals—Back Numbers 

Reprinted Vol. XI $2 & #3; XII #2 & S3, XIV jff4 . 1,968 50 

Mathemalical Reviews and Inter-Society Committee. 225 90 

Office of the Secretary-Treasurer 

Printing, memoranda, etc, (including some stamped enveloped) 1,174.52 

Postage, supplies, express, telephone calls. 225 00 

Clerical help . . . 1,468.00 

Travelling Expense. . . . 30.48 2,898.00 


Miscellaneous. . .... 79,82 

Balance on Hand,** December 20, 1948 . 7,121 01 


Total .$20,291.99 

C, SUMMARY OP RECEIPTS AND EXPENDITURES 

Balance on Hand,** December 31, 1947.. ., . $5,858.37 

Receipts during 1948. 14,433.02 

Expenditures during 1948. 13,170.98 

Balanco on Hand,** December 20, 1948 . 7,121.01 

** In bank deposits and government bonds. 

D. LIKE MEMBERSHIP FUNDS 

It lias been the practice to place all life membership payments in a special fund (most of 
which is in government bondB) and to hold all these funds in reserve until the death of the 
member—after which his payment is released to the general fund. There were no new life 


In bank deposits and government bonds. 









162 


REPORT OP THE SECRETARY-TREASURER 


membership payments in 1948. During the year a transfer to the general fund lias been 
made of the life membership payment of Professor Irving Fisher, who died in 1947. 

December December 

31,1947 W, 1948 

Number of Lite Members . 30 29 

U S. Government Bonds . .. . $1,888.00 $1,888.00 

Bank Deposits . 427.00 392.00 

Total .$2,315.00 $2,280.00 

E BACK ISSUES PUNP 

It has been our policy, since Januaiy 1, 1948, to use income from the sale of back hmues 


to finance the additional reprinting of back issues. 

Income from the Sale of Back Issues during 1948 .. $2,718.27 

Expense for Reprinting Back Issues in 1948 . 1,908.60 

Balance m the Fund, December 20, 1948 . ,, $749.77 


At present 500 copies of Volume 13 #1 and M 2 are being reprinted at a cost of $735.00. 
The payment of this in January will leave a small balance in the fund. 


r COMPARISON OP ASSETS ON DECEMBER 31, 1947 AND DECEMBER 20, 1948 


U. S Government G Bonds . 

Life Membership Funds . 

Back Issues Fund. 

Additional Bank Deposits . 

Current Accounts Receivable. 

Estimated Value (Cost) of Back Annals 1 

Total . 

Net Gain 1948 . . 


I9f7 

J9jS 

... $3,000.00 

$3,000.00 

... 2,316.00 

2,280.00 

. —■ 

749.77 

543.47 

1,091,24 

... 423 55 

291.22 

... 10,866.73 

12,785.61 

... $17,148.05., 

... $20,197.84 
... 3,049.19 


G LIABILITIES OP INSTITUTE OP MATHEMATICAL STATISTICS AS OP DECEMBER 20, 1948 

All bills which have been presented have been paid. The Life Membership Fund now 
contains $2,280.00 which covers 29 members. Also, $4,000.50 has been paid in for con- 
tributions and 1949 dues and subscriptions. 


This report does not cover the amount of $13.95 which is held by the Institute 
for the fund for Annals for Countries Devastated by the War (This fund has 
been under the supervision of Professor Neyman.) During the year tills fund 
purchased $376 25 in back issues (at the agreed rate of $4.50 per volume) which 
has contributed to the total &ales in back issues. 

There has been little change in the life membership fund during the year, 
Our practice of making no transfer of life membership funds until the death 
of the member is most conservative and protects the interests of the life member. 

, n *r® questl0n f th ® vaIue of out inventory is always difficult. We now have 
19,083 issues of the Annals, At 67jS per copy, it appears that $12,785.01 is a 
fair estimate of the ir actual cost. This is in fact less than 5 times the actual 

3 Coat of Annals calculated at 67 cents per oopy. 
















REPORT OF THE EDITOR 


103 


income from back issues this year and hence seems to he a very conservative 
estimate of the marketable (within ten years) value of our piescnt inventory. 

We are in a position now to continue to supply all issues beginning with volume 
7 and expect that the sales in back volumes will be such that within two or three 
years we will be able to reprint the 9 issues in volumes 1-6 which are now prac¬ 
tically or completely exhausted. 

It appears that the increase in dues and subscriptions has been adequate to 
take care of the increased expense during 1948. No bonds have been cashed 
during the year. Additional funds appear necessary for 1949, however, since 
the present amount of clerical help in the office of the Secretary-Treasurer is 
utterly inadequate. The employment of additional secretarial assistance, which 
the Institute must have, will increase the total expense of this office by about 
$1,200.00. It is necessary, too, to provide a cushion for a possible increase in 
our Waverly bill, which is up about 10% in 1948. It appears that we may 
need from $1,500.00 to $2,000.00 additional funds for 1949. Available sources 
are increases in the number of members and subscribers, contributions from 
our members, and institutional contributions and memberships. 

Paul S. Dwyer 
Secretary- Treasurer 

December 21, 1948 


REPORT OF THE EDITOR FOR 1S>48 

During 1948 the rate of submission of manuscripts for publication in the 
Annals has continued to increase. The size of the Annals was held approxi¬ 
mately to that set for 1947, the number of pages printed in 1948 being 610. 
The 1948 volume of the Annals contained 59 papers, of which 24 were short 
notes. 

During the past year the backlog of papers has increased to nearly two issues. 
Thus manuscripts submitted now, especially the longer ones, must wait at least 
six months after being refereed in order to be printed. If the rate at which 
manuscripts are submitted increases, as it has during the last two years, this 
waiting gap may increase to a year by the end of 1949. 

If additional funds could be found, it would be highly desirable to increase the 
Annals to 700 pages in 1949. 

The manuscripts being received continue to cover a rather wide range of 
topics in probability and statistics. Almost all of them are research papers. 
In the Editor’s opinion it would be highly desirable for the Institute to take steps, 
perhaps through invited addresses, to secure good expository and review articles. 
Sustained attempts have been made over a period of years to obtain such articles 
by invitation, but with little success. 

The Editor wishes to take this opportunity to acknowledge, on behalf of the 
Editorial Committee, the generous'refereeing assistance which has been given by 



164 


REPORT OF THE EDITOR 


the following persons: Z W. Birnbaum, A H. Bowker, I W. Burr, G. W. Brown, 
K. L. Chung, W J Dixon, A. Dvoretzsky, T. N, E Greville, I’. E. Grubbs, 
M. H. Hansen, T. E. Haras, C. Hastings, H. B Horton, G. A. Hunt, B. E. 
Kimball, T. Koopmans, H. Levene, M. S MacPliail, P. J. McCarthy, H. B. 
Murphy, M. P, Peisakoff, P. S. Olmstead, E. Paulson, H. G. Romig, L. J. Savage, 
F. F. Stephan, D F Votaw and J. E. Walsh. 

The Editor owes special acknowledgment to Mr. M. E. Freeman for prepara¬ 
tion of manuscripts and to Mrs. Frances M. Purvis for other editorial and office 
assistance. 

S. S. Wilks 
Editor 


December 31, 1948. 



STATISTICAL DECISION FUNCTIONS 

By Abraham Wald 1 
Columbia University 

Introduction and summary. The foundations of a general theory of statistical 
decision functions, including the classical non-sequential case as well as the 
sequential case, was discussed by the author in a previous publication 
[3]. Several assumptions made in [3] appear, however, to be unnecessarily re¬ 
strictive (see conditions 1-7, pp. 297 in [3]). These assumptions, moreover, 
are not always fulfilled for statistical problems in their conventional form. In 
this paper the main results of [3], as well as several new results, are obtained 
from a considerably weaker set of conditions which are fulfilled for most of the 
statistical problems treated in the literature. It seemed necessary to abandon 
most of the methods of proofs used in [3] (particularly those in section 4 of [3]) 
and to develop the theory from the beginning. To make the present paper self- 
contained, the basic definitions already given in [3] are briefly restated in 
section 2.1. 

In [3] it is postulated (see Condition 3, p. 207) that the space 12 of all admissible 
distribution functions F is compact. In problems where the distribution func¬ 
tion F is known except for the values of a finite number of parameters, i.e., where 
12 is a parametric class of distribution functions, the compactness condition will 
usually not be fulfilled if no restrictions are imposed on the possible values of the 
parameters. For example, if 12 is the class of all univariate normal distributions 
with unit variance, 12 is not compact. It iB true that by restricting the parameter 
space to a bounded and closed subset of the unrestricted space, compactness of 
12 will usually be attained. Since such a restriction of the parameter space can 
frequently be made in applied problems, the condition of compactness may not 
be too restrictive from the point of view of practical applications. Nevertheless, 
it seems highly desirable from the theoretical point of view to eliminate or to 
weaken the condition of compactness of 12. This is done in the present paper. 
The compactness condition is completely omitted in the discrete case (Theorems 
2.1-2.5), and replaced by the condition of separability of 12 in the continuous 
case (Theorems 3.1-3.4). The latter condition is fulfilled in most of the conven¬ 
tional statistical problems. 

Another restriction postulated in [3] (Condition 4, p. 297) is the continuity 
of the weight function W(JF, d) in F. As explained in section 2.1 of the present 
paper, the value of W(F, d) is interpreted as the loss suffered when F happens to 
be the true distribution of the chance variables under consideration and the 
decision d is made by the statistician. While the assumption of continuity of 
W (F , d) in F may seem reasonable from the point of view of practical applica¬ 
tion, it is rather undesirable from the theoretical point of view for the following 

1 Work done under the sponsorship of the Office of Naval Research. 

166 



166 


ABEAHAM WALD 


reasons. It is of considerable theoretical interest to consider simplified weight 
functions TF (l* 1 , d) which can take only the values 0 and 1 (the value 0 corresponds 
to a correct decision, and the value 1 to a wrong decision). Frequently, such 
weight functions are necessarily discontinuous. Consider, for example, the 
problem of testing the hypothesis H that the mean 6 of a normally distributed 
chance variable X with unit variance is equal to zero. Let di denote the decision 
to accept H, and <k the decision to reject II. Assigning the value zero to the 
weight W whenever a correct decision is made, and the value 1 whenever a 
wrong decision is made, we have: 

W(6, di ) = 0 for 6 = 0, and = 1 for 8 0; W (8, d») = 0 for 8 0, 

and =* 1 for 6 =* 0. 

This weight function is obviously discontinuous. In the present paper the 
main results (Theorems 2.1-2.5 and Theorems 3.1-3.4) are obtained without 
making any continuity assumption regarding W(F, d). 

The restrictions imposed in the present paper on the cost function of experi¬ 
mentation are considerably weaker than those formulated in [3]. Condition 6 
[3, p. 297] concerning the class of admissible distribution functions, and condi¬ 
tion 7 [3, p. 298] concerning the class of decision functions at the disposal of 
the statistician are omitted here altogether. 

One of the new results obtained here is the establishment of the existence 
of so called minimax solutions under rather weak conditions (Theorems 2.3 and 
3,2). This result is a simple consequence of two lemmas (Lemmas 2.4 and 3.3) 
which seem to be of interest in themselves, 

The present paper consists of three sections. In the first section several 
theorems are given concerning zero sum two person games which go somewhat 
beyond previously published results. The results in section I are then applied 
to statistical decision functions in sections 2 and 3. Section 2 treats the case of 
discrete chance variables, while section 3 deals with the continuous case. The 
two cases have been treated separately, since the author was not able to find 
any simple and convenient way of combining them into a single more general 
theory. 

1. Conditions for strict determlnateness of a zero sum two person game. 
The normalized form of a zero sum two person game may be defined as follows 
(see [1, section 14.1]): there are two players and there is a bounded and real 
valued function K(a, b) of two variables a and b given where o may be any point 
of a space A and b may be any point of a space B. Player 1 chooses a point 
o in A and player 2 chooses a point 6 in B t each choice being made in complete 
ignorance of the other, Player 1 then gets the amount K(a, b) and player 2 the 
amount —K(a, b). Clearly, player 1 wishes to maximize K(a, h) and player 2 
wishes to minimize K(a, b). 

Any element a of A will be called a pure strategy of player 1, and any element 



STATISTICAL DECISION FUNCTIONS 


1G7 


b of B a pure strategy of player 2. A mixed strategy of player 1 is defined as 
follows: instead of choosing a particular element a of A, player 1 chooses a 
probability measure £ defined over an additive class 31 of subsets of A and the 
point a is then selected by a chance mechanism constructed so that for any 
element a of 91 the probability that the selected element a will be contained in 
a is equal to £(a) Similarly, a mixed strategy of player 2 is given by a probabil¬ 
ity measure ij defined over an additive class $3 of subsets of B and the element b 
is selected by a chance mechanism so that for any element j9 of S3 the probability 
that the selected element b will be contained in is equal to y(J3). The expected 
value of the outcome K(a, b ) is then given by 

(1.1) #*(£, V) = [ f K(a, b ) d£ dr,. 

Jb Ja 

We can now reinterpret the value of K(a, b) as the value of /£*(£„, yt) where £ a 
and 7)t are probability measures which assign probability 1 to a and b, respec¬ 
tively. In what follows, we shall write A(£, y) for £*(£, y), K(a, b) will be used 
synonymously with K(% a , £&), K(a, y) synonymously with A(£ 0 , y) and /£(£, b) 
synonymously with A(£, y b ). This can be done without any danger of confusion. 
A game is said to be strictly determined if 

(1.2) Sup Inf m, y) = Inf Sup K(iy). 

t i it 

The basic theorem proved by von Neumann [1] states that if A and B are 
finite the game is always strictly determined, i.e., (1.2) holds. In some previous 
publications (see [2] and [3]) the author has shown that (1.2) always holds if one 
of the spaces A and B is finite or compact in the sense of some intrinsic metrio, 
but does not necessarily hold otherwise. A necessary and sufficient condition 
for the validity of (1.2) was given in [2] for spaces A and B with countably many 
elements. In this section we shall give sufficient conditions as well as necessary 
and sufficient conditions for the validity of (1.2) for arbitrary spaces A and B. 
These results will then be used in later sections. 

In what follows, for any subset a of A the symbol £ a will denote a probability 
measure £ in A for which £(«) = 1. Similarly, for any subset /9 of B, will stand 
for a probability measure y in B for which y($) — 1. We shall now prove the 
following lemma. 

Lemma 1.1. Let {<*<] (i = 1, 2, • • • , ad inf.) be a sequence of subsets of A 
such that a, C a,- + i and let a = . Then 

(1.3) lim Sup Inf KQ - a ,, y) = Sup Inf K (£ a , y). 

1 ftt V 

Proof: Clearly, the limit of Sup Inf 2£(£ nJ , y) exists as i —> qo and cannot 

fa, 1 

exceed the value of the right hand member in (1.3). Put 

(1.4) lim Sup Inf !£(£„, ,y) = p 



168 


ABRAHAM WALD 


and 

(1.5) Sup Inf K(£ a , i?) = P + s (s > 0). 

fa 1 

Suppose that 5 > 0. Then there exists a probability measure £ a such that 


ml ,n)^p + 


for all 7j. 


(1.6) i '// = r i g 

Let be the probability measure defined as follows: for any subset a* of «,• 
we have 

(1.7) «*.,(**) = • 


f «(“•') 


Then, since lim £ a (a — a<) = 0, we have 


( 1 . 8 ) 


lim ml,, i?) - ml , v) 


uniformly in jj. Hence, for sufficiently large i, we have 


(1.9) Inf ml, , v) > P + |, 

which is a contradiction to (1 4). Thus, 5 — 0 and Lemma 1.1 is proved. In¬ 
terchanging the role of the two players, we obtain the following lemma. 

Lemma 1,2. Let (ft) be a sequence of subsets of B such that Pi C pi+i and let 
£7-1 (3. = P- Then 

(1.10) lim Inf Sup m, i ; ft ) = Inf Sup fiC(f, qf). 

i—° vi, f £ 

We shall now prove the following lemma. 

Lemma 1.3. The inequality 11 

(1.11) Sup Inf m, v ) < Inf Sup m, v) 

£ f i £ 

always holds. 

Proof: for any given « > 0, it is possible to find probability measures and 
such that 

(1.12) Inf Sup m, v) k Sup m, y°) - « 

1 £ E 

and 

(L13) Sup Inf m, v) £ Inf K(lf, v) + 

C n ij 


* This inequality was given by v. Neumann [1] for finite spaces A and B. 



STATISTICAL DECISION FUNCTIONS 


169 


Then we have, 

(1.14) Sup Inf ff(f, r,) < Inf K(t v) + e < K(J?, y) + e 

£ i <i 

< Sup X((, + Inf Sup K({, y ) + 2 «. 

i i f 

Since e can be chosen arbitrarily small, Lemma 1.3 is proved. 

Theorem 1.1. If a is a subset of A such that 

Sup Inf iC(£ a , if) = Inf Sup K(%„, y) 

fa 1 1 (a 

and 

Inf Sup /£(£„, if) = Inf Sup IC (?, y), 

1 fa 1 

then 

Sup Inf K($ rf) = Inf Sup Kfo y). 

f i) if 

Proof: Clearly, 

<1.15) Sup Inf 2C(| a , y) < Sup Inf K (£, y) 

fa 5 £ 1 

and 

(1.16) Inf Sup JC(£ a , ij) < Inf Sup K(f, ,). 

1 £a 1 £ 

If the left hand members of (1.15) and (1.16) are equal to each other and 
equal to the right member of (1.16), then 

(1.17) Sup Inf K((, y) > Inf Sup K(t, y). 

£ 1 if 

Because of Lemma 1.3 the equality sign must hold and Theorem 1.1 is proved. 
Interchanging the two players, we obtain from Theorem 1.1: 

Theorem 1.2. If ft is a subset of B such that Sup Inf .£(£, yf) = Inf Sup K(f, yf) 

and Sup Inf K(!j, yf) ~ Sup Inf K(f, y), 

f i/j £ i 

then 

Sup Inf K(Z, y) = Inf Sup Kfo y). 

£ 1 if 

We shall now prove the following theorem. 

Theorem 1.3. If (a,-} is a sequence of subsets of A such that cu C a, +l and 

(* 

2 <*< = A, and if 

i-i 

(1.18) 


Sup Inf K(S ai , 77) = Inf Sup K(U , y) 



170 


ABEAHAM WALD 


for each i , then a necessary and sufficient condition for the validity of 

(1.19) Sup Inf TsTfe y) = Inf Sup If(£, y) 

if] *1 t 

is that 

(1.20) lim Inf Sup 7f(?«,, y) = Inf Sup 7f(£, y). 

i) {»,• i f 

Peooe; Because of (1.18) and Lemma 1.1 we have 

(1.21) lim Inf Sup If (£»,, y) = Sup Inf Jiff, ij). 

*“« I fa, f 1 

Hence, (1.20) implies (1.19) and (1.19) implies (1.20). This proves Theorem 1.3. 

Interchanging the role of the two players, we obtain from Theorem 1.3 the 
following theorem. 

Theorem 1.4. If {(3,) is a sequence of subsets of B such that fit C /3; +l and 

TO 

2 ft = & and if 

»-l 

Sup Inf K(£, W = Inf Sup If(£, t^.), 

£ 10, Vi £ 

then a necessary and sufficient condition for the validity of (1.19) is that 

(1-22) hm Sup Inf if(£, y/, { ) = Sup Inf K(£, y). 

£ 10 ; £ 0 

In [3] an intrinsic metric was introduced in the spaces A and B, The distance 
of two elements m and as of A is defined by 

(1.23) 8(a,, as) = Sup | if(a, , b) - K (a*, b) | . 

b 

Similarly, the distance between two points bi and bi of B is defined by 
(!-24) S(&,, h) = Sup | K(a, &,) - If (a, bf) |. 

a 

Suppose that there exists a sequence (a;) of subsets of A such that a,- is con- 

c6 

ditionally compact, «j C «,- +1 and 23 = -A. 5 It was shown in [3] that for 

»-i 

any conditionally compact subset a, the relation (1.18) holds. Hence, according 
to Theorem 1.3, a necessary and sufficient condition for the validity of (1.19) 
is that (1.20) Jrolds for a sequence fa;j where at is conditionally compact* 

a,- c «i + i and 2] ai = A. Similar remarks can be made concerning the space B. 

The distance definitions given in (1.23) and (1.24) can be extended to the spaces 
of the probability measures £ and 77, respectively. That is, 

(L25) . &) = Sup | 7f(fc , y) - 7f(fc , r,) | 

(3rZr.o^^ C0InPaCt Set8 ' SBe F - HaUsd0r(T - ^scrUhre 



STATISTICAL DECISION FUNCTIONS 171 

■and 

<1.26) S(ni, m) = Sup | #(£, y t ) - #(£. vi) I. 

£ 

We shall say that a probability measure £ is discrete if there exists a denumer¬ 
able subset a of A such that £(a) = 1. Similarly, a probability measure 17 will 
be said to be discrete if y(p) = 1 for some denumerable subset P of B, We shall 
now prove the following theorem. 

Theorem 1.5. If the choice of flayer 1 is restricted to elements of a class C of 
probability measures £ in which the class of all discrete probability measures £ is 
• 1 dense, then a necessary and sufficient condition for the game to be strictly determined 
is that there exists a sequence jo,) of elements of A such that 

<1.27) lim Inf Sup X(£ a ,, 9 ) = Inf Sup K((, y) 

v £«( if 

where 

(X{ — {ui, O/i , * , a,j, 

Proof: Since the class of all discrete probability measures £ lies dense in the 
■class C, there exists a sequence a. — {a;} (j, = 1 , 2 , • ■ ■ , ad inf.) 

.such that 

<1.28) Sup Inf K(£« , y) = Sup Inf K(£, 1,). 

U 1 tv 

Since on = {a t , • • ■ , a,} is finite, we have 

<1.29) Inf Sup K(£ a/ , y) = Sup Inf K(£ a< , 17 ). 

It then follows from Lemma 1.1 that 

(1.30) lim Inf Sup K{f a( , y) = Sup Inf X (£ a , 77 ) = Sup Inf K(£, y). 

i-*> V t* { ta V tv 

■Clearly, (1.30) and strict determinateness of the game implies (1.27). On the 
other hand, any a = {a,} that satisfies (1.27), will satisfy also (1.28) and (1.30). 
But (1.27) and (1.30) imply that the game is strictly determined. Thus, 
Theorem 1.5 is proved. 

Theorem 1.6. If the choice of player 2 is restricted to elements of a class C of 
probability measure y in which the class of all discrete probability measures y lies 
dense, then a necessary and sufficient condition for the strict determinateness of the 
game is that there exists a sequence p = (hi) of elements of B such that 

(1.31) lim Sup Inf K (£, y Pi ) = Sup Inf Kfa y) 

f vff e v 

where 

Pi ~ {hi, • • ■ , b i }. 

This theorem is obtained from Theorem 1.5 by interchanging the players 
1 and 2 . 



172 


ABRAHAM WALD 


2. Statistical decision functions: the case of discrete chance variable. 

2.1. The problem of statistical decisions and its interpretation as a zero sum two 
person game. In some previous publications (see, for example, [3]) the author 
has formulated the problem of statistical decisions as follows: Let X = {X'j 
(i = 1,2, ■ • ■ , ad inf.) be an infinite sequence of chance variables. Any particu¬ 
lar observation x on X is given by a sequence x = j x‘ } of real values where x* 
denotes the observed value of X\ Suppose that the probability distribution 
F(x ) of X is not known. It is, however, known that F is an element of a given 
class 0 of distribution functions. There is, furthermore, a space D given whose 
elements d represent the possible decisions that can be made in the problem 
under consideration. Usually each element d of D will be associated with a 
certain subset w of ft and making the decision d can be interpreted as accepting 
the hypothesis that the true distribution is included m the subset o. The funda¬ 
mental problem in statistics is to give a rule for making a decision, that is, a 
rule for selecting a particular element d of D on the basis of the observed sample 
point x. In other words, the problem is to construct a function d(x), called 
decision function, which associates with each sample point x an element d(x) 
of D so that the decision d(x) is made when the sample point x is observed. 

This formulation of the problem includes the sequential as well as the classical 
non-sequential case. For any sample point %, let n(x) be the number of com¬ 
ponents of x that must be known to be able to determine the value of d(x). In 
other words, n(x) is the smallest positive integer such that d{y) « d(x) for any y 
whose first n coordinates are equal to the first n coordinates of x. If no finite 
n exists with the above property, we put n = «. Clearly, n(x) is the number 
of observations needed to reach a decision. To put in evidence the dependence 
of n(x) on the decision rule used, we shall occasionally write n(x ; 3D) instead of 
n(x) where 3D denotes the decision function d{%) used. If n(x) is constant over 
the whole sample space, we have the classical case, that is the case where a 
decision is to be made on the basis of a predetermined number of observations. 
If n(x) is not constant over the sample space, we have the sequential case. A 
basic question in statistics is this: What decision function should be chosen by 
the statistician in any given problem? To set up principles for a proper choice of 
a decision function, it is necessary to express in some way the degree of im¬ 
portance of the various wrong decisions that can be made in the problem under 
consideration. This may be expressed by a non-negative function W{F, d), 
called weight functions, which is defined for all elements F of ft and all elements 
d of D. For any pair (F, d), the value W(F, d) expresses the loss caused by 
making the decision d when F is the true distribution of X. For any positive 
integer n, let c(n) denote the cost of making n observations. If the decision 
function 3) - d(x) is used the expected loss plus the expected cost of experi¬ 
mentation is given by * 

r[F, 3D] = W\F , d(x)) dF(x ) + £ c (n(x)) dF(x) 


( 2 . 1 ) 



STATISTICAL DECISION FUNCTIONS 


173 


where M denotes the sample space, i.e. the totality of all sample points x. We 
shall use the symbol 35 for d(x) when we want to indicate that we mean the whole 
decision function and not merely a value of d(x) coresponding to some x. 

The above expression (2.1) is called the risk. Thus, the risk is a real valued 
non-negative function of two variables F and 25 where F may bo any element 
of 0 and 25 any decision rule that may be adopted by the statistician. 

Of course, the statistician would like to make the risk r as small as possible. 
The difficulty he faces in this connection is that r depends on two arguments F 
and 25, and he can merely choose 25 but not F. The true distribution F is chosen, 
we may say, by Nature and Nature’s choice is usually entirely unknown to the 
statistician. Thus, the situation that arises here is very similar to that of a 
zero sum two person game. As a matter of fact, the statistical problem may be 
interpreted as a zero sum two person game by setting up the following corres¬ 
pondence. 

Two Person Game Statistical Decision Problem 

Nature 
Statistician 

Choice of true distribution F by Nature 
Choice pf decision rule 25 = d(x) 

Space 0 

Space Q of decision rules 2) that can be used by 
the statistician. 

Risk r(F, 25) 

Probability measure f defined over an additive 
class of subsets of 0 (a priori probability dis¬ 
tribution in the space 12 ) 

Probability measure y defined over an additive 
class of subsets of the space Q. We shall refer 
to >7 as randomized decision function. 

Riskr(£, i?) = f f r(F, 25) dy, 

J Q J Q 

used. 

2 . 2 . Formulation of some conditions concerning the spaces 0 , D, the weight func¬ 
tion W {F, d) and the cost function of experimentation. A general theory of statis¬ 
tical decision functions was developed in [3] assuming the fulfillment of seven 
conditions listed on pp. 297-8. 4 The conditions listed there are unnecessarily 
restrictive and we shall replace them here by a considerably weaker set of con¬ 
ditions. 

In this chapter we shall restrict ourselves to the study of the case where each 
of the chance variables X 1 , X s , • • ■ , ad inf. is discrete. We shall say that a chance 

4 In [3] only the continuous oase is treated (existence of a density function is assumed), 
but all the results obtained there can be extended without difficulty to the disorete case. 


Player 1 
Player 2 

Pure strategy a of player 1 
Pure strategy b of player 2 
Space A 
Space B 

Outcome K(a, b) 

Mixed strategy $ of 
player 1 

Mixed strategy y of 
player 2 

Outcome If(£, y) when 
mixed strategies are 



174 


ABRAHAM WALD 


variable is discrete if it can take only countably many different values. Let 
0 , 1 , a l2 , • • • , ad inf. denote the possible values of the chance variable X\ Since 
it is imm aterial how the values a;,' are labeled, there is no loss of generality in 
putting a %i = j(j = 1 , 2 , 3, ■ • ■ , ad inf,). Thus, we formulate the following 
condition. 

Condition 2.1. The chance variable X' (i = 1, 2, ■ ■ • , ad inf.) can take only 
positive integral values. 

As in [3], also here we postulate the boundedness of the weight function, i.o., 
we formulate the following condition. 

Condition 2 . 2 . The weight function W(F t d) is a bounded function of F and d. 

To formulate condition 2.3, we shall introduce some definitions. Let w bo a 
given subset of £ 2 . The distance between two elements di and ds of D relative to 
w is defined by 

(2.2) 5(d:, d* ; u) = Sup | W(F, di) - W(F, df) |. 

/IW 

We shall refer to d(di , d»; £ 2 ) as the absolute distance, or more briefly, the dis¬ 
tance between di and 4 . We shall say that a subset D* of D is compact (con¬ 
ditionally compact) relative to w, if it is compact (conditionally compact) in 
the sense of the metric S(di , d 2 ; w). If D* is compact relative to £1, we shall 
say briefly that D* is compact. 

An element d of D is said to be uniformly better than the element d’ of D rela¬ 
tive to a subset a of £2 if 

W(F,d) £ W(F, d') for all F in w 

and if 

IF(f, d) < W(F, d') for at least one F in a. 

A subset D* of D is said to be complete relative to a subset w of Q if for any d 
outside D* there exists an element d* in D* such that d* is uniformly better than 
d relative to w. 

Condition 2.3. For any positive integer i and for any positive e there exists a 
subset Z)f,, of D which is compact relative to £2 and complete relative to w< l( whore 
ui., is the class of all elements F of £2 for which prob {X 1 2 = Sr t. 

If D is compact, then it is compact with respect to any subset« of £2 and Con¬ 
dition 2.3 is fulfilled. For any finite space D, Condition 2.3 is obviously ful¬ 
filled. Thus, Condition 2.3 is fulfilled, for example, for any problem of testing 
a statistical hypothesis H, since in that case the space D contains only two ele¬ 
ments di and di where d\ denotes the deoision to reject H and da the decision to 
accept H. 

In [3] it was assumed that the cost of experimentation depends only on the 
number of observations made. This assumption is unnecessarily restrictive. 
The cost may depend also on the decision rule £> used. For example, let SDi 
and 5D 2 be two decision rules such that «(*; ©») is equal to a constant n, , while 



STATISTICAL DECISION FUNCTIONS 


175 


©2 is such that at any stage of the experimentation where ©2 requires talcing at 
least one additional observation the probability is positive that experimentation 
will be terminated by taking only one more observation. Let x° be a particular 
sample point for which n(x°; © 2 ) = nft, © 1 ) = no. There are undoubtedly 
cases where the cost of experimentation is appreciably increased by the necessity 
of having to look at the observations at each stage of the experiment before we 
can decide whether or not to continue taking additional observations Thus 
in many cases the cost of experimentation when x° is observed may be greater 
for ©2 than for ®i. The cost may also depend on the actual values of the ob¬ 
servations made. Thus, we shall assume that the cost c is a single valued func¬ 
tion of the observations x 1 , ■ ■ • , x m and the decision rule © used, i.e., c = 
c(x\ ■■■ , x n , ®). 

Condition 2.4. The cost eft, • • • , x m , ©) is non-negative and lim 
eft, ■ ■ ■ , x n , ©) = « uniformly in x 1 , ■ ■ , x m , © as m —*• «. For each pos¬ 
itive integral value m, there exists a finite value c m , depending only on m, such 
that eft, ■ • , x m , ®) g c n identically in x, • • • , x m , ©. Furthermore, 
eft, • ■ ■ , x m , © 1 ) = eft, © 2 ) if nft, ®i) = n(x] ® 2 ) for all x. Finally, 

for any sample point x we have eft, • • • , ®i) g eft, • ■ • , x n(x,!D, \ ®») 

if there exists a positive integer m such that n(x, © 1 ) = n(x, ® a ) when n(x, © 2 ) < m 
and n{x, © 1 ) = m when n(x, ® 2 ) Si m. 

2.3 Alternative definition of a randomized decision function, and a further con¬ 
dition on the cost function. In Section 2.1 we defined a randomized decision 
function as a probability measure 2 ? defined over some additive class of subsets 
of the space Q of all decision functions d(x ). Before formulating an alternative 
definition of a randomized decision function, we have to make precise the mean¬ 
ing of i) by stating the additive class C Q of subsets of Q over which v is defined. 
Let C D be the smallest additive class of subsets of D which contains all subsets 
of D which are open in the sense of the metric 5(di, d-> ; Q). For any finite set of 
positive integers a x , • • ■ , a k and for any element D* of C a , let Qft , • • • , a* , 
D*) be the set of all decision functions dft) which satisfy the following two con¬ 
ditions: ( 1 ) If x l = cu , x 2 — 02 , • - • , x k = a k , then nft = ft; (2) If x 1 = Oi, • • • , 
x k = ak, then d(x) is an element of D*. Let C* be the class of all sets Qft, 
• • • , at,, D*) corresponding to all possible values of fc, m , • • • , a* and all pos¬ 
sible elements D* of C D . The additive class Cq is defined as the smallest 
additive class containing Cq as a subclass. Then with any 77 we can associate 
two sequences of functions 

{z m ft, ,x m \ 7l)} 

and 

( 5 * 1 , *m(jD* | y)}(m = 1 , 2 , • ■ • , ad inf.) 

where 0 i z, ( x 1 , • • ■ , x m | y) g 1 and for any x 1 , • ■ • , x m , 5*t,.„*. is a prob¬ 
ability measure in D defined over the additive class C D . Here 

Zmft, ■ ■ • , X m I 7)) 



176 


ABRAHAM WA.BD 


denotes the conditional probability that n(x) > m. under the condition that 
the first m observations are equal to x 1 , • * ■ , and experimentation has not 
been terminated for (x 1 , • ■ ■ , x°) for (k — i, 2, • • ‘ , m — 1), while 

$*»:::«-(-D* | n) 

is the conditional probability that the final decision d will be an element of !)* 
under the condition that the sample (x 1 , - ■ • , x m ) is observed and n(x) « m. 
Thus 

Zl(x l I X 2 I v) • ■ ‘ fcm-l(*\ 1 • ' , X m 1 | *?) tl ~ **(* > ‘ ' ' » * I */)]*■* 


(2.3) 


tjIQCx 1 , • • • , x", D] 


and 


(2.4) 


fi«l...l«(jD* j !j) 


n [Q(x\ ■ ■ ■ , s-, D»)l 

• • • , x", D)] ’ 


We shall now consider two sequences of functions (z»(x l , •** , x“)j and 
{£*!.. j, not necessarily generated by a given ??• An alternative definition 
of a randomized decision function can be given in terms of these two sequences 
as follows: After the first observation x 1 has been drawn, the statistician deter¬ 
mines whether or not experimentation be continued by a chance mechanism 
constructed so that the probability of continuing experimentation is equal to 
zi(x l ). If it is decided to terminate experimentation, the statistician uses a 
chance mechanism to select the final decision d constructed so that the prob¬ 
ability distribution of the selected d is equal to 5,i(JD*). If it is decided to take 
a second observation arid the value x 2 is obtained, again a chance mechanism is 
used to determine whether or not to stop experimentation such that the prob¬ 
ability of taking a third observation is equal to ^(x 1 , %), If it is decided to stop 
experimentation, a chance mechanism is used to select the final d so that the 
probability distribution of the selected d is equal to 8*i,»(D*), and bo on. 

We shall denote by f a randomized decision function defined in terms of two 
sequences (z m (x\ , x m )j and ,«(£)*)}, as described above. Clearly, 
any given ij generates a particular f. Let f(r?) denote the C generated by v- 
One can easily verify that two different rj’a may generate the same f, i.e., there 
exist two different ij's, say ip and ir such that £(?ji) = f( t? 3 ). 

We shall now show that for any f there exists an ij such that £(ij) ■» £. Let 
f be given by the two sequences {z„(x\ •••,*"’)} and „»(£*)). Let b, 
denote a sequence of r, positive integers, i.e., bj = (b fi, • • • , bj, r/ ) (j «* 1,2, • < > , k) 
subject to the restriction that no bj is equal to an initial segment of bi(j I), 
Let, furthermore, 2>f, ■ • • , D* be k elements of C D . Finally, let Q(bi , • ■ • , 

, Di , • , D k ) denote the class of all decision functions d(x) which satisfy 



STATISTICAL DECISION FUNCTIONS 


177 


the following condition: If (x 1 , • • • , x r ’) = b,- then n(x) = r, and d(x) is an ele¬ 
ment of Dj (j = 1, • • ■ , k). Let i) be a probability measure such that 

n[Q(b 1, • • • , fa, Dt , ■ ■ • , Dt)} 


(2.5) 


S h (Dt) 


^>nn n ••■n 

m—1 s" 1 —1 xfn^lB.1 *1—1 


- Z m {x\ ••• 


holds for all values of k, bi , ■ ■ • , 5*, Dt , ■ ■ • , -D* . Here g m (x l , • ■ • , x m ) ~ 

1 if (a: 1 , • • ■ , x m ) is equal to an initial segment of at least of one of the samples 
bi, ■ • • , fa , but is not equal to any of the samples bi, • • ■ , 5*. In all other 
cases g m (x, • • , x m ) = 0. The function g* m (x 1 , ■ ■ • , x m ) is equal to 1 if 
(x l , ■ • ■ , x m ) is equal to one of the samples fa , ■ ■ , bk , and zero otherwise. 
Clearly, for any ?j which satisfies (2 5) we have f ( 17 ) = fa The existence of such an 
ij can be shown as follows. With any finite set of positive integers ii, ■ • • , i r 
we associate an elementary event, say A r {fa , ■ • - , i r ). Let A r (ii , • • , i r ) 
denote the negation of the event A r (ii , • • • , i r ), Thus, we have a denumerable 
system of elementary events by letting r, , • • • , i T take any positive integral 
values. We shall assume that the events Ai(l), Ai(2), • ■ ■ , ad inf. are inde¬ 
pendent and the probability that Ai(i) happens is equal z x (i). We shall now 
define the conditional probability of A 2 (i, j ) knowing for any k whether ^4 : (7c) 
or A x (k) happened. If A x (i) happened, the conditional probability of At(i, j) — 
z 2 (f, j) and 0 otherwise. The conditional probability of the joint event that 
At(ii, ji), A 2 (t 2 , jt), ■■■ , ASr, Jr), Mir+ 1 , jr+i), ■ • • , and Ai(i r+ ,, j T+ ,) will 
happen is the product of the conditional probabilities of each of these events 
(knowing for each 1 whether A x (i) or A x (i) happened). Similarly, the condi¬ 
tional probability (knowing for any 1 and for any (i, j), whether the correspond¬ 
ing event A 2 (i, j ) happened or not) that A s (ii, ji, fa) and A 3 (i 2 , j 2 , fa) and 
■ • • A s (i r , j r , fa) and Aj(i r+ i, j r+ i, fa+i) and • • • and A 3 (i r +,; j r +,, k r+l ) will 
simultaneously happen is equal to the product of the conditional probabilities 
of each of them. The conditional probability of A 3 (i, j, k) is equal to z»(f, j, k) 
if Ai(i) and A 2 (i, j) happened, and zero otherwise; and so on. Clearly, this 
system of probabilities is consistent. 

If we interpret A r (fa , ■ • • , i r ) as the event that the decision function ® = 
d(x) selected by the statistician has the property that n(x; 3D) > r when x l = 
ii, ■ • • , x r = i ,, the above defined system of probabilities for the denumerable 
sequence {A r (ii, • • • , i r )j of events implies the validity of (2.5) for D* - 
D(j = 1, • ■ • , k). The consistency of the formula (2.5) for D* = D implies, 
as can easily be verified, the consistency of (2.5) also in the general case when 
D* D. 

Let fa be given by the sequences of {z m ,-(x\ • ■ ■ , x m )} and {5*i .(m - 
1, 2, • • • , ad inf.). Let, furthermore, $• be given by (z m (i 1 , ■ ■ • , x m )\ and 
We shall say that 

( 2 . 6 ) lim ft = f 

<—oO 



178 


ABRAHAM WALD 


if for any m, x 1 , ■ ■ ■ , x m we have 

(2.7) lim z m (x, • • • , as*) = s n (x\ •••,»") 

and 

(2.8) lim d*i...j,m,i(0*) = 8 z i 

jo-00 

for any open subset D* of D whose boundary has probability measure zero ac¬ 
cording to the limit probability measure 
In addition to Condition 2.4, we shall impose the following continuity con¬ 
dition on the cost function. 

Condition 2.5. If 

lhn Uvd = t(v), 


then 

lim j c(k 1 , • • • , a:" 1 ,5D) dm = J c(x\ ••• , x m , SD) dij. 

0(*1,•• ,i">) C(* l i- 

where Q(x 1 , ••• , x m ) is the class of all decision functions 3) /or which n(y, 3)) — 
mif y x = x', • • • , y m = £ m . 

2.4. The main theorem. In this section we shall show that the statistical 
decision problem, viewed as a zero sum two person game, is strictly determined. 
It will be shown in subsequent sections that this basic theorem haa many im¬ 
portant consequences for the theory of statistical decision functions. A precise 
formulation of the theorem is as follows: 

Theorem 2.1. If Conditions 2.1-2.5 are fulfilled, the decision problem, viewed 
as a zero sum two •person game, is strictly determined, i.e., 

(2.9) Sup Inf r((, if) = Inf Sup r({, y). 

tv v t 

To prove the above theorem, we shall first derive several lemmas. 

Lemma 2.1, For any e > 0, there exists a positive integer m,, depending only 

on e, such that the value of Sup Inf r (£, if), is not changed by more than e if we re - 

t v 

strict the choice of the statistician to decision functions d(x) for which n(x) -§ m, 
for all x. 

Proof: Put W 0 = Sup W(F, d) and choose m, so that 

r,D 

(2.10) c(x l 3)) > ~ 

€ 

identically in x , • • ■ , x m and 3) for all m § m t . The existence of such a value 

m t follows from Condition 2.4. Consider the function Inf r(£, SD). Our lemma 

so 

is proved, if we can show that for any £, the value of Inf r(£, 3)) is not increased 

D 



STATISTICAL DECISION FUNCTIONS 


179 


by more than e if we restrict © to be such that n(x, ©) < m, for all x. The latter 
statement is proved, if we can show that for any decision function ©1 = dfx) 
we can find another decision function ©a = d % (x) such that n{x, ©a) < m, for 
all x and r(£, © 3 ) < r(£, ©i) + «. There are two cases to be considered; (a) 
prob (n(X, © 1 ) > m, | £} S e /We and (b) prob (n(X, © a ) > m, | £} < e/Wo . 
In case (a) we have r(£, ©i) S: Wo . In this case we can choose ©2 to be the rule 
that we decide for some element d<> of D without taking any observations. 
Clearly, for this choice of ©2 we shall have r(£, © 2 ) < r(£, © 1 ). In case (b) 
we chooBe ©a as follows: 

<h(x) = diix) whenever n(x, © 1 ) ^ m,; 

d?(x) = d 0 whenever n(x, ©i) > m ,, 

where d 0 is an arbitrary element of D. Thus, n(x, © 2 ) g m t for all x. Since 
prob {n(x, ©j) > m, | £) < e/W 0 ,it is clear that r(£, © 2 ) g r(£, ©a) + «■ Hence 
our lemma is proved. 

Let Q m denote the class of decision functions ©for which n(x ; ©) g m for 
all x. For any positive «, let Q m,t denote the class of all decision functions 
which satisfy the following two conditions simultaneously: ( 1 ) n(x, ©) g m for 
all x; (2) d(x) is an element of D*i,, where D*i t< denotes the subset of D having 
the properties stated in Condition 2.3. Clearly, Q m '* C Q m . A probability 
measure t? will be denoted by i\ m if v(Q m ) = 1, and by r? m, ‘ if = 1 , 

Lemma 2.2. The following inequality holds: 

(2.11) Sup Inf r(£, ij m ) g Sup Inf r(£, g Sup Inf r(£, rj”') + e Wo, 

l V m ( 1 m " t 1™ 

where W 0 is an upper bound of W(F, d). 

Proof: The first half of (2.11) is obvious. If we replace the subscript x l by 
the chance variable X 1 , the set «,i, f defined in Condition 2.3 will be a random 
subset of ft. It follows easily from the definition of < 0 *i.« that 

( 2 . 12 ) prob jF] £ 1 — «, 

With any decision function © = d(x) we shall associate another decision func¬ 
tion ®* — d*(x) such that n(x, ©) = n( x, ©*); d*(x) = d(x) whenever d(x) t 
D* i (l ; and d*(x) is an element of D*i,. that is uniformly better than d(x) 
relative to w,iwhenever d(x) 4 DZ . It follows from (2.12) and the fact that 
Wo is an upper bound of W(F, d) that 

(2.13) r(F, ©*) g r(F, ©) + e Wo- 

The second half of (2.11) is an immediate consequence of (2.13) and our lemma 
is proved. 

Lemma 2.3. The equation 

(2.14) Sup Inf r(£, tT' 1 ) - Inf Sup r(£, 

f j)W.« i 

holds for all m and c. 



180 


ABRAHAM WALD 


Pboof. For any positive integral values m, k and for any p > 0, let ft"'*'* be 
the class of all elements F of 12 for which 

prob { x 1 g k and x 2 £ k and • • • x m *5 1 — p. 

A probability measure £ for which £(Q, m,k ' p ) = 1 will be denoted by To 

prove (2.14), we shall first prove the inequality 

(2.15) | Sup Inf r(r*' p , V m, ‘) - Inf Sup ’?” M ) I £ p(JF« + (7») 

^m, a £Tn,k,p 

where (7 m is an upper bound of Cfc 1 , • • • , x\ 3)) for all r g m, r 1 , ■ ■ ■ ,x r and 3). 

Since for any d(x ) in Q m,< , d(x) must be an element of D*i, t and since D*t,, 
is compact, it is sufficient to prove the validity of (2.14) in the case when D* i,, 
is a finite set Thus, we shall assume in the remainder of the proof that D *(,, 
is finite 

Let S be a given positive number and let Q m ' k ' < be a finite subset of Q m,t satis¬ 
fying the following condition: for any element 2) => d{x) in Q m ' t there exists an 
element 5D* = d*{x) in Q m,k, ‘ such that 


P(x) = d(x) and | C(x, 5D*) - C(x, 2)) ] £ S 

for all x for which x 1 ^ k, x 2 ^ k, • • • , and x m £ k. Clearly, for any choice of 
$ there exists a finite subset Q n ' h, ‘ of Q m, ‘ with the desired property. For any 
5D in Q m '\ we can then find an element 2)* in Q m ' k: ' such that 


r(F, 3)*) g r(F, 2)) + p(Wo + C m ) + 5, 
for all F in 0’ n, ‘ ,p . From this it follows that 



where ij = 1. Since j 9 finite, we have 

(2-18) gup Inf r = Inf Sup r. 

{m,*,p v m,k„ (n,*,p 

Inequality (2.15) follows from (2.16), (2.17) and (2.18) and the fact that 5 
can be chosen arbitrarily small. 

Lemmas 1.1., 1.3 and the inequality (2.15) imply that Lemma 2.3 must hold 
if 


(2.19) 


lim Inf Sup r = Inf Sup r 


of°(2 S i9) TllUS ’ * he Pr °° f ° f L6mma 2,3 is com P lete d if we can show the validity 

Let to*'*} (fc = 1, 2, ■ • • , ad inf.) be a sequence of randomized decision func¬ 
tions such that 

(2.20) lim [Sup r(r* ,P , v"'‘) ~ Inf Sup r{?' k '> «"••)] = 0 



STATISTICAL DECISION FUNCTIONS 


181 


Let ft, — i{r\k'‘) (see definition in Section 3.2) and let ft be given by the two 
sequences of functions {z r .*(i 1 , , x r )} and (S*i...*>■,*} (r = 1, 2, • ■ * , m). 

Since there are only countably many samples {x, x T ) (r £ m), there exists 
a subsequence {/c 1 ) of the sequence (&} such that 

(2.21) lim ••• ,x r ) = z r (x, • • • ,x T ) 

*—oo 

and 


(2.22) lim Sgi,.. x — Sri** . ,*r 

k -oo 

for all r and all samples (x 1 , • • • , x r ). Let ria'* be a randomized decision func¬ 
tion such that r(ijo'*) is equal to the f defined by {z r (z, ■ • • , x r )) and 
(r = 1, 2, ■ • • , m). 

For any element F of fi and for any v > 0, there exists a finite subset M, of 
the m-dimensional sample space such that the probability (under F) that the 
sample (x 1 , • ■ ■ , x M ) will fall in M, is ^ 1 — v. From this and the continuity 
of the cost function (Condition 2.5) it follows that 

(2.23) lim r(F, w) = r(F, >j?'*) for all F. 

*—oo 

Clearly, 

(2.24) Sup r(r ,k ' P , n) = Sup r{F m ' k ' p , n) 

fW ,k,p F m t *,p 

where F m ' k ' p is an element of Q m ' k,p . Hence 

(2.25) Inf Sup r(f) = Iuf Sup tf"'). 

fjm,^ £7n>,p ijm,« 

Since any F in ft is contained in ,p for sufficiently large k, it follows from (2.20) 

and (2.25) that 

(2.26) lim r(W) ^ lim {Inf Sup r(f"'*’' > «*■*)}. 


Hence, because of (2.23), 

(2.27) r(F, rff’*) £ lim (Inf Sup r(F m ' k ' p , 

Thus, 

(228) Inf Sup r(F, if 1 '') g lim (Inf Sup r(F m ' k ' p , 

!(■».« f *-« H«.« /».*.(> 

Since the left hand member of (2.28) cannot be smaller than the right hand 
member, the equality sigh must bold. This concludes the proof of Lemma 2.3. 

Theorem 2.1 can easily be proved with the help of lemmas 2.1, 2,2 and 2.3. 
From Lemma 2.2 it follows that 


(2.29) 


lim Sup Inf r = Sup Inf r. 

«—0 £ iff",* | qW 



182 


ABRAHAM WALD 


From this and Lemma 2.3 we obtain 


(230) 

lim Inf Sup r = Sup Inf r. 

t -=0 if n, ie f £ i) m 

But 

lim Inf Sup r S Inf Sup r. 

e “0 j) m -e i { 

Hence 


(2.31) 

Inf Sup t Ik Sup Inf r. 

( £ i m 

Hence, because of Lemma 1.3, we then must have 

(2.32) 

Sup Inf r = Inf Sup r. 

£ o'" 0" £ 

It follows from Lemma 2,1 that 

(2.33) 

lim Sup Inf r = Sup Inf r. 

m—00 f ij m f if 

Hence, because of (2.32), we have 

(2.34) 

lim Inf Sup r = Sup Inf r. 

m-» 1 ” £ £ 1 

But 


(2.35) 

lim Inf Sup r ^ Inf Sup r. 

m-00 f V t 

Hence 



(2.36) Inf Sup r ;g Sup Inf r 

1 E El 

Theorem 2.1 is an immediate consequence of (2.36) and Lemma 1.3, 

2.5. Theorems on complete classes of decision functions and minimax solutions. 
For any positive e we shall say that the randomized decision function rjo is an 
t-Bayes solution relative to the a priori distribution £ if 

( 2 - 37 ) r ( t , Vo ) ^ Inf r(£, 77) + e, 

V 

If i/o satisfies (2.37) for « = 0, we shall say that 170 is a Bayes solution relative 
to £. 

A randomized decision rule m is said to be uniformly better than m if 
( 2 - 38 ) r(F, Tji) g r(F, ij 2 ) for all F 

and if 


( 2, 32) r(F , t?i) < r(F, r/ 2 ) at least for one F. 

A class C of randomized decision functions 77 is said to be complete if for any 
v not in C we can find an element 77 * in C such that v * is uniformly better than 17 . 



STATISTICAL DECISION FUNCTIONS 


183 


Theobem 2.2. If Conditions 2.1-2.5 are fulfilled, then for any e > 0 the class 
C, of all e-Bayes solutions corresponding to all possible a prion distributions £ 
is a complete class. 

Proof: Let no be a randomized decision function that is not an e-Bayes solu¬ 
tion relative to any £, That is, 

{2.40) r(£, no) > Inf r(£, n) + e for all £. 

If r(F, no) — 00 for all F, then there is evidently an element of C, that is uni¬ 
formly better than r/o • Thus, we can restrict ourselves to the case where 

(2.41) r(F, no) < at least for one F. 

Put 


<2.42) W*(F, d) = W(F, d) - r(F, no) 

and let r* (£, n) denote the risk when W(F, d) is replaced by W*(F, d). Then 


<2.43) r*(£, n) = r(£, n) - r(£, no). 

Let Q m denote the class of all decision functions d{x) for which n(x) £ m 
identically in x. Furthermore, denote any n for which n(Q m ) — 1 by n"- We 
shall first prove the following relation. 

(244) Sup Inf r*(£, n ") = Inf Sup r*(£, n m ) 

£ <t m i m e 

for any positive integral value m. For any positive constant c, let Q„ denote the 
class of all elements F for which r(F, no) £ c. 

Clearly, Conditions 2.1-2.5 remain valid if we replace W(F, d) by W*{F, d) 
and 11 by U, where c is restricted to values for which Sl„ is not empty. Hence, 
Theorem 2.1 can be applied and we obtain 


(2.45) 


Sup Inf r*(£°, n m ) - Inf Sup r*(f, if), 

£e il"* £» 


where £“ denotes any £ for which £(S1„) = 1. Let h and w be two positive values 
for which 

(2.46) Sup Inf r*(£°, tj m ) S —h for all c 

f' ,m 


and 


(2.47) 


r(F, n m ) ^ w for all F and all jj". 


Clearly, such two constants h and u exist. From (2.46) and Lemma 1.3 we ob¬ 
tain 


(2.48) 

Since 


Inf Sup r* (£, n m ) sS —h. 

>i n ( 


(2.49) 


r*(F, n m ) < — (h + 5 ) for any F not in Ha+8+„(5 > 0), 



184 


ABRAHAM WALD 


it follows from (2 48) that 

(2.50) Inf Sup r* = Inf Sup r* for all c > h + w. 

(t i,™ { 

From (2.45) and (2.50) we obtain 

(2.51) Sup Inf r* = Inf Sup r* for all c > h + w. 

( c r‘ I” ( 

Hence, 

(2.51a) Sup Inf r* g Inf Sup r*. 

( i,” f 

Because of Lemma 1.3, the equality sign must hold and (2.44) is proved. 
Since no is not an element of C,, we must have 

(2.52) Inf r(£, rj) < f({, no) ~ «• 

From this it follows that 

(2.53) Inf r*(|, n) £ — 

Hence 

(2 54) Sup Inf r*({, ??) g — e. 

t i 

It was shown in the proof of Lemma 2.1 that for any p > 0 there exists a 
positive integer m p , depending only on p, such that 

(2-55) Inf r{%, n*’) ^ Inf r(£, n) + P for all £. 

» l m P 1 } 

From (2.44), (2.54) and (2.55) it follows that there exists a positive integer 
m 0 , namely mo = m e / 2 , such that 

(2.56) Inf Sup r*(£, n m ) < — 4for any m S ni), 

From (2.44) and (2.56) it follows that there exists an a priori distribution fj 
and an e-Bayes solution tjT relative to such that 

( 2 - 57 ) r*(F, ni) S - - € for all F. 

4 

Hence, because of (2.43), 

@■58) r(F, nT) g r(F, no) — ~ for all F. 

4 

and Theorem 2.2 is proved, 

Theorem 2.3. If D is compact, and if Conditions 2.1, 2.2, 2.4, 2.5 are fulfilled, 
then there exists a minimax solution, i.e., a decision rule no for which 



STATISTICAL DECISION FUNCTIONS 


185 


(2.59) Sup r(F, uo) ^ Sup r(F, ij) for all ij. 

r r 

To prove the above theorem, we shall first prove the following lemma. 
Lemma 2.4. If D is compact and if Conditions 2.1, 2.2, 2.4, 2.5 are fulfilled, 
then for any sequence {jji} (t = 1 , 2, • • * , ad inf.) of randomized decision functions 
for which r(F, rn) is a bounded function of F and i, there exists a subsequence (??,■,( 
0 = 1, 2, • ■ ■ , ad inf.) and a randomized decision function m such that 

(2.60) lim inf r(£, y, t ) £ r(£, ij 0 ) for all £. 

1-0 

Peoof: Let = £(i;,) (defined in Section 2.3) be given by { z,,i(x l , • • • , x r )l 
and {8 x i x i ..xr,,} (r = 1, 2, • • • , ad inf.). Thus, ZrAx 1 , • • • , x T ) is the con¬ 
ditional probability that we shall take an observation on X T+1 using the rule 
Tji and knowing that the first r observations are given by x 1 , • ■ • , x and that ex¬ 
perimentation was not terminated for (x 1 , • • ■ , x k ) (fc < r). As stated in section 
2.3, for any r, x 1 , • • • , x the symbol denotes the conditional probability 

distribution of the selected d when 77 ,- is used and is known that the first r ob¬ 
servations are equal to x 1 , • • , x and that n{x) = r. Since there are only count¬ 
ably many finite samples (x 1 , • ■ ■ , x r ), it is possible to find a subsequence [»,} of 
ft} such that lim z T , Xi (x l , • • • , x r ) and lim 8 X 1 .exist 6 Put 

J—00 j —00 

(2.61) lim z r .,,(x\ ,x r ) = z T , 0 (x l , ■■■ ,x r ) 

3-00 

and 

(2.62) lim = • 

;-«3 

As shown in section 2.3, there exists a randomized decision function rjo such 
fo = t(vo) is given by (z r ,o(x\ • • • , x 2 )} and {$, 1 .. xr,o). Let q,,,(x\ , x r \ £) 

denote the probability that the sample (x, • • • , x T ) will be obtained and that 
experimentation will be stopped at the r-th observation when £ is the a priori 
distribution and 77 , is the decision rule used by the statistician. For any sample 
(x\ ■■•,*’’) let R,(x l , ■ • • , x T ) denote the expected value of W(F, d ) when the 
distribution of F is equal to the a posteriori distribution of F as implied by £ 
and (ar, • ■ • , x r ) and where d is a chance variable independent of F with the 
probability distribution 5»i.. Since, r(£, ru) is bounded by assumption, 
the probability that experimentation will go on indefinitely is equal to zero. 
From this it follows that 

(2.63) T, q,Jx\ - • ■ , x r j £) = 1 for all £. 

r»*rrr.,*r 

1 The existence of hm 6*1 follows from the compactness of D (Bee Theorem 3.0 

in [3]). 



186 


ABRAHAM WALD 


Then r(g, ij 4 ) is given by 
r(£.i?,) 

[ c(x\ • • • , x\ 35) rfru 

f2»(®\ ■ • •, x*) + —~— --— 

/ dl>{ 

where Q, 1 ,*r is the totality of all decision functions d(x) for which n{y) = r 
whenever y 1 = x\ = x r . Clearly, 

(2.65) hm q r<ij (x\ ■ • • , x r \ g) = g,.^ 1 , ■ ■ ■ , a: r |g). 

j ao 

Since D is compact and since W(F, d) is a continuous function of d uniformly 
in F (in the sense of the metric defined in D), we have 

(2.66) lim Rifix 1 , • • • , x r ) = R a (z l , • • ■ , *')• 

/—00 

From Condition 2.5 it follows that 

f i ‘ ‘ ■ j * i ®) dijtj f c(x , • • • , x SD„) c/?7o 

(2 67) lim £_ = Ign 1 .^__ , 

J '“ / <*»«, [ 

Lemma 2.4 is an immediate consequenco of the equations (2.04) — (2.07). 
We are now in a position to prove Theorem 2.3. Because of Theorem 2,1 
there exists a sequence (jj;j such that 

(2-68) lim Sup r(F, ri,) = Inf Sup r(F, tj). 

• -•o f i, r 

According to Lemma 2.4 there exists a subsequence (i;q) (j - 1, 2, • • * , ad inf,) 
and a randomized decision function tjq such that 

( 2 - 69 ) hm inf r{F, *.) £ r(F, vo) for all F. 

7-oo 

It follows from (2 68) and (2.69) that ijo is a minimax solution and Theorem 
2.3 is proved. 

Theorem 2.4. If D is compact and if Conditions 2.1, 2.2, 2.4, 2.5 ore fulfilled, 
then for any g there exists a Bayes solution relative to g. 

This theorem is an immediate consequence of Lemma 2.4. 

We shall say that m is a Bayes solution in the wide sense, if there exists a 
sequence {£,) (t = 1, 2, • • • , ad inf.) such that 

( 2>7 °) Hm Kfi, %) - Inf r(g,, >,)] = Q. 

■™° v 

We shah say that Vo is a Bayes solution in the strict sense, if there exists a £ 
such that ijo is a Bayes solution relative to g. 


<2.64) 


= E 


Qr, i * * * j $ | £) 


i* r 



STATISTICAL DECISION FUNCTIONS 


187 


Theorem 2.5. If D is compact and Conditions 2.1-2.5 hold, then the class of all 
Bayes solutions in the wide sense is a complete class. 

Proof: Let yo be a decision rule that is not a Bayes solution in the wide sense. 
Consider the weight function W*{F, d) = TP(F, d) — r(F, yf). We may assume 
that r(F, y 0 ) < » for at least some F, since otherwise there obviously exists a 
Bayes solution in the wide sense that is uniformly better than yo . Then it 
follows easily from (2.44) and Lemmas 2.1 and 1.3 that 

(2.71) Sup Inf r*(£, ij) = Inf Sup r*(£, y) = v* (say), 

{ » 1 £ 

where r*(£, y) is the risk corresponding to W*(F, d), i.e., 

(2.72) r*(f, ,) = r({, ,) - r(£, y 0 ). 

Theorem 2.3 is clearly applicable to the risk function r*(£, y). Then, there 
exists a minimax solution yi for the problem corresponding to the new weight 
function W*(F, d). Since, because of 2.72, t>* g 0, we have 

(2.73) r*(€, yi) = r($, yf) - rft, y 0 ) g 0 for all f. 

Our theorem is proved, if we can show that yi is a Bayes solution in the wide 
sense. Let {&) (i ~ 1, 2, • • • , ad inf.) be a sequence of a priori distributions 
such that 

(2.74) lim Inf r*(£,, y) — v*. 

i—OOI| 

Since y\ is a minimax solution, we must have 

(2.75) i *&, t,i) £ v*. 

It follows from (2.74) and (2.75) that iji is a Bayes solution in the wide sense 
and our theorem is proved. 

We shall now formulate an additional condition which will permit the deriva¬ 
tion of some stronger theorems. First, we shall give a convergence definition 
in the space SI. We shall say that F, converges to F in the ordinary sense if 

(2.76) lim p r (x\ ■ • • , x T \ Ff) = p r (x\ • * • , x r |F) (r = 1, 2, • • • , ad inf.). 

Here p r { as, • • ■ ,x\F) denotes the probability, under F, that the first r observa¬ 
tions will be equal to a 1 , • • ■ , x r , respectively. We shall say that a subset « 
of S2 is compact in the ordinary sense, if w is compact in the senBe of the conver¬ 
gence definition (2.76). 

Condition 2,6. The space U is compact in the ordinary sense. If F% con¬ 
verges to F, asi—> co, in the ordinary sense, then 

lim W(F { , d) = W(F, d) 

»—00 

uniformly in d. 

Theorem 2 .6. If D is compact and if Conditions 2 1, 2.2, 2.4, 2.5, 2.6 hold, 
then: 



188 


ABRAHAM WALD 


(i) there exists a least favorable a priori distribution, i.e., an a priori distribution 
Jo for which 


Inf r(io , ij) = Sup Inf r(£, tj). 

V tv 


(ii) A minimax solution exists and any minimax solution is a Bayes solution 
in the strict sense. 

(iii) If j/o is a decision rule which is not a Bayes solution in the strict sense and 
for which r(F, no) is a bounded function of F, then there exists a decision rule in 
which is a Bayes solution m the strict sense and is uniformly better than 170 . 

Proof: Let {£,) (i = 1, 2, ■ ■ ■ , ad inf.) be a sequence of a priori distributions 
such that 


( 2 . 77 ) Iim Inf r(J<, 77) = Sup Inf r(J, 77). 

V t V 

Since Q is compact in the ordinary sense, there exists an a priori distribution 
Jo and a subsequence {J, y ) or {J,j such that 

(2.78) lim J^(w) = Jo(w) 

for any subset w of Si which is open (in the sense of the ordinary convergence 
definition infl) and for which J 0 («*) = 0, where w* denotes the set of all boundary 
points of w. We shall show that Jo is a least favorable distribution. Assume 
that it is not. Then there exists a decision function ©0 = do(x) such that 

(2.79) r( Jo, © 0 ) =i v — 5, 


where 6 > 0 and v denotes the common value of Sup Inf r and Inf Sup r. It was 

tv » f 

shown in the proof of Lemma 2,1 that (2,79) implies the existence of a decision 
function ©1 = d\{x) and that of a positive integer m such that 

(2-80) n{x\ ©j) g m for all $ 

and 


( 2 -81) r(J 0 , ©0 ^ v — 

Since c(x, ■ ■ • , x n , ®i) and I V(F, d) are uniformly bounded and W(F, d) is 
continuous in F uniformly in d, we have 

(2.82) lim r (F<, © t ) = r (F, © 1 ) 

<—00 

for any sequence {F,} for which Ft —> F in the ordinary sense. From (2.78), 
(2.82) and the compactness of SI (in the ordinary sense) it follows that 


limr (J<,, Si) ~ r(J 0 ,©,)£»- 

j—« 2 


(2.83) 



STATISTICAL DECISION FUNCTIONS 


189 


But this is in contradiction to (2.77) and, therefore, & must be a least favorable 
distribution. Hence, statement (i) of our theorem is proved. 

Statement (ii) is an immediate consequence of Theorems (2.1), (2.3) and state¬ 
ment (i) of Theorem (2.6). 

To prove (iii), replace the weight function W(F, d) by TF*(F, d) = 
W(F, d) — r(F, ijo) where vo satisfies the conditions imposed on it in (iii). 

We shall show that (i) remains valid also when W(F, d) is replaced by 
W*(F, d). This is not clear, since W*(F> d) may not be continuous in F. First 
we shall prove that 

(2.84) lim inf r(J,', i> 0 ) S r(f(, i?o) 

t—OO 

for any sequence {£,■) for which —» Jo in the ordinary sense, i.e., for whicli 

(2.85) lim j((w) = Jo'(co) 


for any open subset « (open in the sense of ordinary convergence defined in 11) 
whose boundary has probability measure zero according to Jo'. For any sample 
x l , ■ ■ ■ , x T let < 7 r,(r\ • • • , x T ) denote the probability that the first r observations 
will be equal to x 1 , • ■ • , x\ respectively, when j( is the a priori distribution. 
Clearly, 

(2 86) g n (x\ •••,/) = f Pr(x\ , x T | F) dt[. 

Since p r {x l , • • • , x r | F) is a continuous function of F, we have 
(2.87) lim q r ,{x\ ■ ■ ■ , x r ) = gro(*\ ■■■ , x r ). 

I—OO 


The function r(£, ij a ) can be split into two parts, i.e., r(J, ijo) = ri(f, t? 0 ) + r B (J, no) 
where n is the expected value of the loss W(F, d) and n is the expected cost of 
experimentation. Since W(F, d ) is a bounded function of F and d, and since 
W(F, d) is continuous in F uniformly in d, we have 

(2-88) Inn r : (j(, jj 0 ) = fi(Jo , no) 

for any sequence {£,'} which satisfies (2.85). To prove (2.84), we merely have 
to show that 

(2.89) lim inf r 2 (J,', no) ^ r a (J 0 ' , Vo). 

But 


(2.90) r 2 (j;,„o) 




c(* 1 , ■ • • , x r ; ®) dij 0 


where Q*i.. is the totality of all decision functions d(x) with the property 
that d(y) — r for any y whose first r coordinates are equal to x 1 , • ■ • , x r , respec¬ 
tively. Equation (2.89) is an immediate consequence of (2.87) and (2.90). 
Hence, (2.84) is proved. 



190 


ABRAHAM WALD 


Let r*(£, y) be the risk function when W(F, d) ^ replaced by IF*(F, d), i.e., 
r *tf, „) = r(£, ri) - r(£, y„). Let, furthermore, [£?) be a sequence of a priori 
distributions such that 

(2.91) lim Inf y) = Sup Inf r*(£,q). 

»“« tj { n 

There exists a subsequent (£*,) of the sequence |f?) such that f* converges (in 
the ordinary sense) to a limit distribution £* as j w. We shall show that 
Jo* is a least favorable distribution. For suppose that f? is not a least favorable 
distribution. Then theie exists a decision function £> 0 * ® d? (a:) such that 

(2.92) r*( f* ©?) ^ a* - 5 

where 5 > 0 and v* = Sup Inf r* = Inf Sup r*. But then there exists a decision 

f i it 

function ®* = d* (x) and a positive integer m such that 


(2 93) 

n(x-, SD?) ^ mfor all x 

and 


(2 94) 

r*((t, $>?) ^ v* - | 

Since r*(£, £)?) = 

r(£, ®f) - r(£, y 0 ), and since 


lim r(£*! , SDj) - r(tf, ®?) : 

7—oo 

it follows from (2.84) and (2.94) that 

(2.95) 

lim sup , ®f) ^ v* - 

^■*00 


which is in contradiction to (2.91). Hence, the validity of (i) is proved also 
when W(F, d) is replaced by W*(F, d). Clearly, also (ii) remains valid when 
W(F, d) is replaced by W*(F, d). 

Let i?i be a minimax solution relative to the problem corresponding to 

W*(F, d). Then because of (ii), m is a Bayes solution in the strict sense. 

Since yo is not a Bayes solution in the strict sense, yi y a and v* < 0. Hence 

in is uniformly better than ij 0 . This completes the proof of Theorem 2 . 6 . 

We shall now replace Condition 2,0 by the following weaker one. 

Condition 2.6*. There exists a sequence {fl,| (i = 1, 2, • • • , ad inf.) of 

subsets of n such that Condition 2.6 is fulfilled when 0 is replaced by , f! i+v 3 

and lim SI, = fi. 

»■» 

We shall say that i), converges weakly to 17 as t -*• », if lim f(ip) = £(y). 

We shall also say that y is a weak limit of y.. This limit definition seems to be 
natural, since r(£, yf) = r(£, yf) if f( tj 2 ) = We shall now prove the follow¬ 

ing theorem: 



STATISTICAL DECISION FUNCTIONS 


191 


Theorem 2.7. If D is compact and if Conditions 2.1, 2 2, 2.4, 2,5 and 2.6* aie 
fulfilled, then: 

(i) A minimax solution exists that is a weak limit of a sequence of Bayes solu¬ 
tions in the strict sense, 

(ii) Let rjo he a decision rule for which r(F, no) is a hounded function of F. Then 
there exists a decision rule ni that is a weak limit of a sequence of Bayes solutions 
in the strict sense and such that r(F, ni) = r(F, no) for all F in £2. 

Proof: According to theorem 2 6 , there exists a decision rule m that is a Bayes 
solution in the strict sense and a minimax solution if £2 is replaced by £2,. There 
exists a subsequence {rj,.} (j = 1 , 2 , ■ • • , ad inf.) of the sequence {t?,} such that 
{i);.) admits a weak limit. Let no be a weak limit of {ij/J . Then, as shown in 
the proof of Lemma 2.4, equation (2 GO) holds and no is a minimax solution rela¬ 
tive to the original space £ 2 . Thus, statement (i) is proved. 

To prove (ii), replace W(F, d) by W*(F, d) = W(F, d) — r(F, no)- Accord¬ 
ing to Theorem 2.6 there exists a decision rule nu such that vn is a minimax solu¬ 
tion and a Bayes solution in the strict sense when £2 is replaced by £ 2 , and W (F, d) 
by W*(F, d) Clearly, tji, remains to be a Bayes solution in the strict sense also 
relative to £2 and W(F, d ). Since nu is a minimax solution relative to £ 2 , and 
IF*(F, d), we have 

(2.96) r(F, iju) sg r(F, no) for all F in £ 2 ,. 

Let { nuj } be a subsequence of the sequence { 771 ,} such that {r?i iy } admits a weak 
limit Tji. Then, (2.60) holds for and ijj , and 

(2.97) r(F, nt) g r(F, no) for all F in 0 . 

Since rji is & weak limit of strict Bayes solution, statement (ii) is proved. 

3. Statistical decision functions: the case of continuous chance variables. 

3.1. Introductory remarks. In this section we shall he concerned with the 
case where the probability distribution F of X is absolutely continuous, i.e., 
for any element F of £2 and for any positive integer r there exists a joint density 
function p r (x l , - ■ • , x r \ F) of the first r chance variables A 1 , • • • , Xf, 

The continuous case can immediately be reduced to the discrete case discussed 
in section 2 if the observations are not given exactly but only up to a finite num¬ 
ber of decimal places. More precisely, we mean this: For each i, let the real 
axis R be subdivided into a denumerable number of disjoint sets Ra , R», • “ , 
ad inf. Suppose that the observed value x' of X' is not given exactly; it is merely 
known which element of the sequence (j = 1, 2, , ad inf,) contains 

x\ This is the situation, for example, if the value of x' is given merely up to a 
finite number, say r, decimal places (r fixed, independent of i). This case can 
be reduced to the previously discussed discrete case, since we can. regard the 
sets R,, as our points, i.e, we can replace the chance variable X' by F’ where 
F’ can take only the values R,i , Ru, • ■ • , ad inf. (F 1 takes the value R,,- if X‘ 
falls in Ru). If W{Fi, d) = W(F 3 , d) whenever the distribution of F under 



192 


ABRAHAM WALD 


Fi is identical with that under Ft , only the chance variables F l , F 2 , • • - , etc 
play a role in the decision problem and we have the discrete case. If, tire latter 
condition on the weight function is not fulfilled, i e., if there exists a pair (F\ , Ft) 
such that W(F\, d) ^ TF(Fj, d) for some d and the distribution of F is the same 
under Fi as under Ft, we can still reduce the problem to the discrete case, if in 
the discrete case ire permit the weight W to depend also on a third extraneous 
variable 0, i e., if we put W = W(F, G, d), where G is a variable about whose 
value the sample does not give any information. The results obtained in the 
discrete case can easily lie generalized to include the situation where IF «* 
W(F, G,d ). 

In practical applications the observed value x' of X ' will usually be given 
only up to a certain number of decimal places and, thus, the problem can be 
reduced to the discrete case. Nevertheless, it seems desirable from the theo¬ 
retical point of view to develop the theory of the continuous case, assuming 
that the observed value x' of X { is given precisely. 

In section 2.3 an alternative definition of a randomized decision rule was given 
in terms of two sequences of functions f z,(x\ ■ • • , m r )j and {<5,i..., r j (r » 1, 2, 
,ad inf.). We used the symbol t; to denote a randomized decision rule given 
by two such sequences. It was shown in the discrete case that the use of a 
randomized decision function y generates a certain f =* £(y), and that for any 
given f there exists an y such that f = {■(*?)■ Furthermore, because of Condition 
2 5, in the discrete case we had r(F, yi) = r(F, i&) if - J-(nt). It would be 
possible to develop a similar theory as to the relation between f and y also in the 
continuous case However, a somewhat different procedure will be followed for 
the sake of simplicity. Instead of the decision functions d(x), we shall regard 
the f s as the pure strategies of the statistician, i.e., wo replace the space Q of 
all decision functions d(x) by the space Z of all randomized decisions rules 
It will then be necessary to consider probability measures y defined over an 
additive class of subsets of Z, It will be sufficient, as will be seen later, to con¬ 
sider only discrete probability measures y. A probability measure y is said to be 
discrete, if it assigns the probability I to some denumerable subset of Z. Any 
discrete y will clearly generate a certain f - £(»,). In the next section we shall 
formulate some conditions which will imply that r(F, m) = r(F, Vl ) if {-( Vl ) = 
Thus, it will be possible to restrict ourselves to consideration of pure 
strategies f which will cause considerable simplifications. 

The definitions of various notions given in the discrete case, such as minimax 
solution, Bayes solution, a priori distribution ( in £2, least favorable a priori dis- 
tribution, complete class of decision functions, etc, oan immediately be ex- 
tended to the continuous case and will, therefore, not be restated here. 

3.2 Conditions on £2 , D, W(F,d) and the cost function . In this section we shall 
tormulate conditions similar to those given in the discrete case. 

Condition 3.1. Each element F of £2 is absolutely continuous. 

Condition 3 2 . W(F,d) is a bounded function of F and d, 

Cotoition 3.3. The space D is compact in the sense of its intrinsic metric 
S{di , d* ; £2) (see equation 2.2), 



STATISTICAL DECISION FUNCTIONS 


193 


This condition is somewhat stronger than the corresponding Condition 2.3. 
While it may be possible to weaken this condition, it would make the proofs of 
certain theorems considerably more involved. 

Condition 3.4. The cost of experimentation c(x l , ■ ■ ■ , x m ) does not depend on 
f. It is non-negative and lim c(x, - • • , x m ) = °o uniformly in x 1 , • • • , x m . For 

each positive integral value m, c(x l , ■ • ■ , x m ) is a hounded function of x l , • • • , x'". 

This condition is stronger than Conditions 2.4 and 2.5 postulated in the dis¬ 
crete case The reason for formulating a stronger condition here is that we wish 
the relation r(F, i?j) = r(F, m) to be fulfilled whenever I(.m) = I(ys) which will 
make it possible for us to eliminate the consideration of ij’s altogether. Since 
the f’s are regarded here as the pure strategies of the statistician, it is not clear 
what kind of dependence of the cost on f would be consistent with the require¬ 
ment that r(F, in) = r(F, yf) whenever f (in.) = tM- 
We shall say that Fi —* F in the ordinary sense, if for any positive integral 
value m 

lim f pmix 1 , ■ • • , x m | F t ) dx 1 • • • dx m = f p m (x l , • • ■ , x m | F) dx l ■ ■■ dx m 

uniformly in S m where S m is a subset of the m-dimensional sample space. 

Condition 3.5. The space fi is separable in the sense of the above convergence 
definition 

No such condition was formulated in the discrete case for the simple reason 
that in the discrete case fi is always separable m the sense of the convergence 
definition given in (2.76). 

3.3. Some lemmas. We shall first give a convergence definition in the space 
Z of all f ■’s which is somewhat different from the one given in the discrete case. 
Let Kix 1 , x 2 , • • • , x r , D*) denote the probability that experimentation will be 
terminated with the rth observation and that the final decision d selected will 
be an element of D*, knowing that the first r observations are equal to x l , • ■ • , x r , 
respectively. That is, 

3 hfix 1 , • • • , x r , D*) = zilx^ztix 1 , x 2 ) 

J ■ ■ - Zr-ifc 1 , • • • , T r_1 )(l ~ Zr(x l , • • ■ , *'))«,!...,r(D*). 

Clearly, the functions ■ ,x r , D*) are non-negative and satisfy the follow¬ 

ing conditions: 

(3.2) 23 h r (x l , • ■ ■ ,x r , D*) ^ 1 for any D* and for any sample (x\ • ■ ■ , x m ). 

r-» 1 

(3.3) ± h f (x l , ,x r , D*) = h r (x l , • • • , , D*), 

00 

if 23 D* — D* and D% , L>* , • • ■ , etc. are disjoint. 

i-i _ 

* For a definition of a separable space, see F. Hausdorff, Mengenlehre (3rd edition), p. 126, 



194 


ABRAHAM WALD 


One can easily verify that for any sequence of non-negative functions 
[h r (x\ ■ • , x, D*)] (r = 1,2, ■ ■ •) satisfying (3.2) and (3 3) there exists exactly 
one sequence (z r (a; 1 , •• , rc r )} and one sequence {S*i . X ' (D*) | such that (3.1) 
is fulfilled. Thus, a randomized decision rule f can be given by a sequence 
{h r (x\ ■■■ ,x r , D*) } satisfying (3.2) and (3.3). The functions z r {x\ , x r ) and 
o x i . jr need be defined only for samples x\ ■ • • , x" for which z ( (x\ • • • , ad) > Q 
for i = 1, • • ■ , r — 1, The above mentioned uniqueness of z r (x l , • • ■ , x r ) 
and 4,i. x t was meant to hold if the definition of these functions is restricted 
to such samples x l , • ■ ■, x r . 

For any bounded subset S, of the r-dimensional sample space, let 
(3 4) H r (S r , D*) = f h r (x L , ■ ■ ■ , x r , D*) dx l • ■ • dx r . 

Ja, 

Let (ft)(i = 0, 1, 2, ■ • ■ , ad inf.) be a sequence of decision rules, 
and Hr M i(S T , D*) be the function If r (S„ D*) corresponding to ft . We shall 
say that 

(3.5) lim ft » Co 

I “CO 

if 

(3.6) lim H„,(S,, D*) - HM , D *) 

l—OO 

for any r, any bounded set S r and for any D* that is an element of a sequence 
■*:,} (h, = 1, • • • , r,; j = 1 , ' • • , l; l a l, 2, • - • , ad inf.) of subsets 
of D satisfying the following conditions: 

(3-7) £ -D*i — O; 2 O*,...*, = , 

hi 

( 3 -8) D kl ., • • • , Dt 1 ...k,_ in are disjoint, 

and 

(3.9) the diameter of D*,...*, converges to zero as l —> ro uniformly in /q , * ■ ■ , /q. 

Lemma 3.1. For any sequence fft}(i = 1, 2 ad inf.) of decision rules 
there exists a subsequence {f.,} (j = 1, 2 ad inf.) and a decision ride ft such, 
that lim ft, = ft . 

Proof: Let H f ,{(S r , D*) (r = 1, 2, • • • , ad inf.) be the sequence of functions 
associated with ft . Let, furthermore, {.D*,be a sequence of subsets of I) 
satisfying the relations (3.7), (3.8) and (3.9). Clearly, for any fixed r and any 
xed element li,,,...*, of the sequence {-D?,...*,}, it is possible to find a subse¬ 
quence IlI 0 = 1, 2, ■ , ad inf.) of the sequence (t} (the subsequence (id 
may depend on r and D kl . *,) and a set function H f , 0 (S r ) such that 

(310) Hr. u (S f , D**...*,) = HrASr). 

Using the well known diagonal procedure, it is therefore possible to find a fixed 



STATISTICAL DECISION FUNCTIONS 


195 


subsequence {i,} (independent of r and D*) and a sequence of set functions 
[H r , 0 (S r , !>£■..*()} such that 

(3.11) lim Hr, t ,{Sr , Dt ; = tfr.„(&, D A * . I,) 

j-ra 

for all values of r, 7ci, ■ • • , /q and 2. 

To complete the proof of Lemma 3.1, it remains to be shown that 
there exists a decision rule to such that the associated function II r (S r , I)*) is 
equal to H r fi(S r , D*) for any D* that is an element of Since 

hr.itf, ■ ■ ■ , x r , D *) is uniformly bounded, the set function Ih,o(S r , I)* 1 ..i,) is 
absolutely continuous. Hence for any values of ki , • • • , ki there exists a func¬ 
tion h r ,ofx 1 , ■ ■ ■ , x T , D* l ...k l ) such that 


(3.12) f hr.oix 1 , ,x r , Dt v ..k t )dx l • • • dx = H rfi (S r , Dt l ... k ,). 

The existence of a to with the desired property is proved, if we show that the 
functions h r ,o(x i , • ■ ■ , x T , D* r *,) satisfy the relations (3.2) and (3.3). Let 
h r (x l , • ■ ■ , x m , D*) = h,(x\ ■ ■ • , x r , D*) for any m > r. Then, since the func¬ 
tions h Tl , satisfy (3.2), we have 

m 

(3.13) £ D*) ^ V(S m ) 

r-1 

where F(S m ) denotes the m-dimensional Lebesgue measure of S n * From (3.13) 
it follows that 


(3.14) £ H rA (S m> Dt l ... ki ) g V(S m ). 

r-*l 

Hence,the functions h r ,o(x l , ■ , x , £)£...*,) must satisfy (3.2) except perhaps 

on a set of Lebesgue measure zero. Since the functions h T fix l , • • • ,x r , D *) satisfy 
(3.3), we must have 


(3.15) 


Sr,,(Sr, a;,..*,_,) = £ H r . 0 (Sr,D* kl ::. tl ). 

hi -1 


Hence, the same relation must hold also for , D *^..*,), But this implies 

that the functions hr.oiz 1 , • ■ ■ , x T , ..*,) satisfy (3.3) except perhaps on a set 
of Lebesgue measure zero, and the proof of Lemma 3,1 is completed, 

Lemma 3.2. Let Ti(S ) {i = 0,1, 2, • • •) be a non-negative , completely additive 
set function defined for all measurable subsets S of the r-dimensional sample space 
M t . Assume that 


(3.16) T.-OS) g V{S ) 

for all S (i = 0,1, 2, • • • , ad inf,) where V ( S ) denotes the Lebesgue measure of S. 
Let, furthermore, x 1 , • ■ • ,x r )bea non-negative function such that 

^ ■ , x T ) dx ■ ■ • dx T < oo, 

M r 


(3.17) 



ABRAHAM WALD 


196 

If 

( 3 . 18 ) lim Ti{S) - T 0 (S) 
then 

(3.19) lim [ g(x' ,•■•,/) dT, = f g(x', ••• ,x r ) dT 0 . 

,-x -I Ur 4Ur 

Proof: Let Af,,« be the sphere in M r with center at the origin and radius c. 
Clearly, 

(3.21) lim [ g(x x , ■ • • , x r ) dx 1 ■ • • dx r — f g(x l , ■ • • , x ) dx • • • dx r . 

t-« Jit r ,„ J>/ r 

Hence, because of (3.16), we have 

(3.21) lim [ f g(x , • ■ ■ , x r ) dT, - f g(x 1 , ,x r ) dT ( 1 - 0 

c »oo L^ 

uniformly in i. Hence our lemma is proved if wc show that 

(3.22) lim f g(x l , • • • , x T ) dT, - f g{x l , ■■■ ,x) dT t 

i—oo JM rie * ^r,o 

for any finite c. Let g A (a; 1 , ■ ,x r ) — g(x\ ■ • • , x r ) when g(x' - ,x r ) A, 
and = 0 otherwise. Since 

lim / (g - g A ) dx 1 dx T = 0 
J U r ,c 

it follows from (3,16) that 

(3.23) lim f (g - g A ) dT< = 0 

Ju r ,, 

uniformly in i. Hence, our lemma is proved if we can show that 

(3.24) lim [ g A dT, - f g A dT 0 

for any c > 0 and any A > 0. Let S,- denote the set of all points in M,, e for 
which 

(3-25) (j — 1) e S g A < j « 

where e is a given positive number. We have 

(3.26) E (j - 1) e f dT, g f gndTi^ZjJ dT h (i - 0, 1, 2, ■ • •)- 

I Jfly J M r , c j JSj 

Since for any «, j can take only a finite number of values, and Bince e can be 
chosen arbitrarily small, our lemma follows easily from (3.18) and (3.26). 

Lemma 3 3, Let {fr} be a sequence of decision rides such that lim £< = ft and 



STATISTICAL DECISION FUNCTIONS 


197 


r(F, f,) is a bounded function of F and i (t h 1). Then 

(3.27) lim inf r(F, f.) ^ r(F, f 0 ). 

t—00 

Proof: First we shall show that it is sufficient to prove Lemma 3.3 for any 
finite space D. For this purpose, assume that Lemma 3 3 is true for any finite 
decision space, but there exists a non-finite compact decision space D and a 
sequence (f,[ such that lim f, = fo and 

t—oo 

(3.28) lim inf r(F, f,) = r(F, fo) — 5 for some F{S > 0). 

Since f, —> f 0 , there exists a sequence {D*,...*,} of subsets of D satisfying the 
conditions (3.7)-(3.9) and such that 

(3.29) lim J? r ,,()S r , D*,. .k,) = H,fi(S r) D* 1 .) 

t—00 

where H ril (S r , D*) is the function H r associated with f<(i = 0, 1, 2, • ■ • )■ Let 
X be a fixed value of l and consider the corresponding finite sequence {jD*,. ,* x ) 
of subsets of D. Let k be the number of elements in this finite sequence. We 
select one point from each element of the finite sequence {D*,..,k X j. Let the 
points selected be di , di, ■ • ■ , d k and let £> denote the set consisting of the points 
dt , • • , dt . Let fi be the decision rule defined as follows: the function 
Kix 1 , • • • , x r , dj) associated with f< is equal to h r ,i(x\ • • • , x r , I)*) where D* is 
equal to the element of the finite sequence {D kl .**} which contains the point 
d,(j — 1, ■ • • , fc). Clearly, because of (3.29), 

(3.30) lim fi = fo. 

<—=o 

Furthermore, for sufficiently large X we obviously have 

(3.31) | r(F, — r(F, f.) | g e for i = 0, 1, 2, • • • , ad inf. 

Since for finite D our lemma is assumed to be true, we have 

(3.32) lim inf r{F, f.) g r{F, f D ). 

<—00 

Choosing« g -, we obtain a contradiction from (3,28), (3.31) and (3.32). Thus, 
o 

it is sufficient to prove Lemma 3.3 for finite D. In the remainder of the proof 
we shall assume that D consists of the points di, ■ ■ • , dh ■ 

The probability that we shall take exactly m observations when f t is used and 
F is true is given by 

prob. {n = m \ F, fi) 

- / pm(x\ - • • , x m j F)h m ,i(x\ ■ • • , x m , D)dx\ • • • dx m 


(3.33) 



198 


ABKAHAM WALD 


where M m denotes the m-dimensional sample space. Since 
lim H m ,i(S n , D ) — , D), 

it follows from Lemma 3,2 that 

(3.34) lim prob {n — m | F, f.j = prob {« = m j F, j'aj. 

l—OJ 

Hence 

(3.35) lim prob {rt g in | F, f<) =* prob \n g m | F, f»}. 

00 

Since r(F, ft) is a bounded function of F and i (i 1), we must have 

(3.36) lim prob {n g m | F, ft} = l (i = 1, 2, • • • ) 

uniformly in F and i. From (3.35) and (3.36) it follows that 

(3.37) lim prob (n £ m j F, ft) — 1 

m—*o 

uniformly in F. Because of (3.36) and (3.37), we have 


(3.38) 

r(F, ft) 

= Z r m (F, ft) 

m—1 

(t >= 0, 1, 2, • ■ > , ad inf.), 

where 




(3.39) 

r m (F, ft) = 

ij pm\ •••, 

1-1 J u m 

z m \F)W(F,dd d<) 



+ / Vmix 1 , 

• P)o{x\ .<*,*") dH nii {S„,D). 

Since 






lim , D*) 

= , D*) 


for any subset D* of D, it follows from Lemma 3.2 that 
( 3 - 4 °) r n (F, ft) - r„(F, ft). 

Lemma 3.3 is an immediate consequence of (3.38) and (3.40). 

3.4. Equably of Sup Inf r and Inf Sup r, and other theorems. In this sootion 
we shall prove the main theorems for the continuous case, using the lemmas de¬ 
rived in the preceding section. 

Theorem 3.1. If Conditions 3.1-3. 5 are fulfilled , then 

Sup Inf r(f, f) = Inf Sup r(f, ft). 

Proof: Let Z m denote the class of all fts for which prob (n ^ mlf, F) = 1 
for all F. We shall denote an element of Z* by ft m . First we shall show that it 



STATISTICAL DECISION FUNCTIONS 


199 


is sufficient if for any finite to we can prove Theorem 3.1 under the restriction 
that t must be an element of Z m . For this purpose, put Wo = Sup W (F, d) 

F,i 

and choose a positive integer to, so that 

(3.42) c(x\ • • • , x m ) > 

for all to S m ,. The existence of such a value to, follows from Condition 3.4. 
We shall now show that for any £ we have 

(3.43) Inf r(£, f m ) g Inf r(£, f) + e for any m m,. 

Jin J 

Let ft be any decision rule. There are two cases to be considered: (a) prob 

[n ^ m, |f, ft} ^ ^; (b) prob {n £ m, j £, fi) < In case (a) we have 

r(£, Ci) = Wo ■ In this case, let ft be the rule that we decide for some d without 
taking any observations. Clearly, we shall have r({, ft) g Wo and, therefore, 
r(£, ft) ^ r(£, ft). In case (b), let ft be defined as follows: h r (x\ ■ ■ ■ , x T , D*) 
for ft is the same as that for ft when r < m,, and h r (x l , • • ■ , x r , do) for ft is equal 

me—I 

tol - E h(x\ • • ■ , x k , D) when r ~ m., and zero when r > m, where do is a 

i-i 

fixed element of D. Since prob (n g to. | £, ft) < ~, we have 

rr 0 

r(£, ft) ^ r(f, ft) + e. 

In both cases ft is an element of Z m •. Hence (3,43) is proved. From (3.43) we 
obtain 

(3.44) Sup Inf r g Sup Inf r g Sup Inf r + «. 

(i f f m « e r 

Assume now that 

(3.45) Sup Inf r = Inf Sup r 

{ f « f" t 

holds for any to. From (3.44) and (3.45) we obtain 

(3.46) Inf Sup r ^ Sup Inf r + e. 

r m « { { r 

Hence 

(3.47) Inf Sup r S Sup Inf r -f e. 

r i ( r 

Since this is true for any e, we have 

(3.48) Inf Sup r g Sup Inf r. 

r ( j r 

Theorem 3.1 follows from (3.48) and Lemma 1.3. 



200 


ABRAHAM WALD 


To complete the proof of Theorem 3.1, it remains to bo shown that (3.45) 
holds for any m. Since D is compact, (3.45) is proved if we can prove it for any 
finite D. In the remainder of the proof we shall, therefore assume that D con¬ 
sists of k points di, ,d k - Let u be a subset of fi tliafc is conditionally compact 
in the sense of the metric 7 

(3.49) k(Fi, Ft) = Sup I T dF x - [ dF, j 

J "Xflt 3 

where S m is a subset of the m-dimensional sample space. We shall show that u 
is conditionally compact also in the sense of the intrinsic metric given by 

(3.50) h(Fi , Ft) = Sup | r(Fi, D ~ r[F,, H | . 

f" 

Let 

(3.51) h(Fi, Ft) = Sup | W(Fi , d) - W(F t , d) \ 

d 

and 

(3.52) S>(Fi, Ft) ~ So(Fi , Ft) + &t(F i , Ft), 

It follows from Condition 3.3 and Theorem 3,1 in (3] that fi, and therefore 
also u, is conditionally compact in the sense of the metric St (Ft , Ft). Hence u 
is conditionally compact in the sense of the metric S } (Fi , Ft). The conditional 
compactness of u relative to the metric Si(Fi , Ft) is proved, if we can show that 
any sequence (ft) that is a Cauchy sequence relative to the metric S, is a Cauchy 
sequence also relative to the metric Si. Let (f,| (i = 1, 2, • > - , ad inf.) be a 
Cauchy sequence relative to . Then there exists a distribution F a (not neces¬ 
sarily an element of A) and a function W(d) such that 

(3.53) lim W(Fi , d) => W(d) uniformly in d 

1—W 

and 


(3.54) lim f dFi - ( dF<> 

<-•» •’am Ja„ 

uniformly in »S m , We have 

riFon-ttf 

r—1 j-l Ju r 

(3.55) -p{x\ ••• ,a; r | FijiWi.Fi, dj)h&, • ■ •, x r , d,) dz l dx T 

+ § J Mf 0 *W, Fi)K(x\ ■■■ ,x r ,D) dx' • 


dx', 



dFwe mean / 


f)ds‘ ••• dx n . 



STATISTICAL DECISION FUNCTIONS 


201 


where M r denotes the r-diraensional sample space. The sequence {F,} is a 
Cauchy sequence relative to the metric 5i if there exists a function r(f m ) such that 

<3.56) lun r(Fi , f") = r(D 

uniformly in f m . Let f(F ,, f m ) be the function we obtain from r(F{ , f m ) by 
replacing the factor PP(F,, dj) by W(d, : ) under the first integral on the right 
hand side of (3.55). Because of (3.53), we have 

<3.57) lim {r(F< , D - r(F;, f")] = 0 

uniformly in f m . Thus, (3.56) is proved if we can show the existence of a func¬ 
tion f(f m ) such that 

<3.58) lim f(F<, D - f(n 

1—40 

uniformly in f m . Let C be a class of functions ^(x 1 , • • ■ , x m ) such that 

| • • • , x m ) | < A < co for all <p in C. 

It then follows from (3.54) that there exists a functional g(<p) such that 

<3.59) lim [ <p dF, — g(tp ) 

uniformly in ip. Application of this general result yields (3.58) immediately. 
Hence, {F<} is a Cauchy sequence relative to the metric 5i and, therefore u is 
shown to be conditionally compact relative to the metric Si if it is relative to 
the metric Sq . 

It then follows from Theorem 3.2 in [3] that Sup Inf r = Inf Sup r if we replace 

j fw r m ( 

if by a subset w that is conditionally compact relative to S a .* Since SI is separable 
relative to S 0 , there exists a sequence {0,} of subsets of fi such that 0, is condi¬ 
tionally compact relative to $o, fi,+i 73 0, and = 0* is dense in U. Let 

F denote an a priori distribution £ for which £(S2,) = 1. Since the left and right 
hand members in (3.45) remain unchanged when is replaced by fl*, it follows 
from Theorem 1.3 that equation (3.45) is proved if we can show that 

(3.60) lim Inf Sup r = Inf Sup r. 

>'-» f" t 1 fl» { 

Let {f?) (t = 1, 2, ■ ■ • , ad inf.) be a sequence of decision rules such that 

(3.61) lim [Sup r(£’, f?) - Inf Sup r] = 0. 

i-oo f• r» 


B Strictly, we would have to write Inf instead of Inf where if" is a probability measure in 

T? m 

the spaoe of all But, since the use of any discrete probability measure is equivalent to 
the use of a and since the restriction to discrete if 1 does not change Sup Inf r or Inf Sup r 

{ v m i m ( 


we can replace Inf by Inf. 

qm f m 



202 


ABRAHAM WALD 


According to Lemmas 3.1 and 3.3, there exists a subsequence {£,} of |£j and 
a decision rule ft such that 

(3 62) hm inf r(F, ft,) & r(F, ft) for all F. 

7-00 

Since SI* is dense in SJ, it follows from (3.61) and (3.62) that 

(3.63) Sup r{F, ft) ^ lim Inf Sup r 

r <-» f* (< 

and, therefore, 3.60 holds. Thus, (3.45) is proved and the proof of Theorem 

3.1 is completed. 

Theorem 3,2. If Conditions 3.1-3.5 are fulfilled, then there exists a minimax 
solution , i e., a decision rule to for which 

(3.64) Sup r{F, f 0 ) ^ Sup r(F, f) for all f. 

f r 

Proof: Because of Theorem 3.1 there exists a sequence |f,} (i = 1,2, • • • , ad 
inf.) of decision rules such that 

(3-65) lim Sup r(F, f.) = Inf Sup r(F, f). 

l-» P f 7 

According to Lemmas 3.1 and 3.3 there existB a subsequence jfq} of {ft j and a 
decision rule ft such that 

3-66 lim inf r(F, ft,) £ r(F, ft) for all F. 

It follows from (3.65) and (3.66) that ft is a minimax solution and Theorem 

3.2 is proved. 

Theorem 3.3. If Conditions 3.1-3.6 are fulfilled, then for any £ there exists a 
Bayes solution relative to £. 

This theorem is an immediate consequence of Lemmas 3.1 and 3 3. 

Theorem 3.4. If Conditions 3.1-3.5 are fulfilled, then the class of all Bayes 
solutions in the wide sense is a complete class. 

The proof is omitted, since it is entirely analogous to that of Theorem 2,5. 
3.5. Formulation of an additional condition. In this section we shall formulate 
an additional condition which will permit the derivation of some stronger 
theorems. Let the metric SfiFy, F 2 ) be defined by 

So (Fi,Ff) - E ~ Sup | f dFi- f dF 3 | 

m-l m, fl m ■'«* J Sm 

where S m may be any subset of the m-dimensional sample space. 

Condition 3.6. The space 0 is compact relative to the metric S 0 (I'\ , Ft) 

lim W(F t , d) = W(F t , d ) 
uniformly m d if lim « 0 (F,, F 0 ) = 0. 

t 

Theorem 3.5. If Conditions 3.1-3.6 hold, then 



STATISTICAL DECISION FUNCTIONS 


203 


(i) there exists a least favorable a priori distribution 
(li) any minimax solution is a Bayes solution in the strict sense 
(iii) for any decision rule ft which is not a Bayes solution in the strict sense and 
for which r(F, ft) is a bounded function of F there exists a decision rule ft which is a 
Bayes solution in the strict sense and is uniformly better than ft. 

Proof: The proofs of (i) and (li) are entirely analogous to those of (i) and (ii) 
in Theorem 2.G, and will therefore be omitted here. 

To prove (iii), let ft he a decision rule that is not a Bayes solution in the strict 
sense and for which r (F, ft) is bounded. We replace the weight function W(F, d) 
by W*(F, d) = W{F, d) — r(F t ft). We shall show that (i) remains valid when 
W(F, d) is replaced by W*(F, d) This is not obvious, since r(F, ft), and there¬ 
fore also TP*(F, d) may not be continuous in F. First we shall prove that 

(3.67) lirn inf r(£,', ft) ^ r(£o, ft) 

t—QQ 

for any sequence {£<) for which 

lim £,'(o)) = £o(w) 

for any open subset w of fi (in the sense of the metric S 0 ) whose boundary has 
probability measure zero according to £o . Let r m (F, f) denote the conditional 
expected value of the loss W(F, d) plus the cost of experimentation when n — m, 
F is true and the rule f is used by the statistician (see equation (3.39)). Since 
W(F, d) and the cost of experimentation when m observations are taken are 
uniformly bounded, one can easily verify that 

(3.68) lim r m (F,, ft) = r m (F 0 , ft) 

for any sequence [Ft] for which 

(3.69) lim 5 0 (P,, F 0 ) = 0. 

<■» 

Hence, since ft is compact (Condition 3.6), 

(3.70) lim r m (|,., ft) = r m (ft', ft) 

where 

(3.71) r m (f,ft) = f r ffl (F,ft)d$. 

Since 

OO 

r(f, ft) = £ r m (f, ft) 

m —1 

inequality (3.67) follows from (3.70). 

The remainder of the proof of (iii) will be omitted here, since it is the same 
as that of (iii) in Theorem 2.6. 



204 


ABRAHAM WALD 


We shall now replace Condition 3.6 by the following weaker one. 

Condition’ 3.6*. There exists a sequence {12,'} (i = 1, 2, • ■ • , ad inf.) of sub¬ 
sets of 12 such that Condition 3.6 is fulfilled when 12 is replaced by 12,-, 12(+i 2D 12, and 

lim 12* = 12. 

1 = 60 

Theorem 3.6. If Conditions 3.1-3.5 and 3.6* are fulfilled then 

(i) A minimax solution ft and a sequence {ft} (i = 1, 2, ■ ■ • , ad inf.) exist 
such that lim 1, = ft and ft (i = 1, 2, • ■ ■ , ad inf.) is a Bayes solution in the strict 

sense . 

(ii) For any decision rule fo for which r(F, ft) is bounded there exists another 
decision rule ft such that ft is a limit of a sequence of Bayes solutions in the strict 
sense and r(F, ft) g r(F, ft) for all F m 12. 

Proof: According to Theorem 3.5, for each i there exists a decision rule 
ft (i = 1, 2, • • , ad inf.) such that ft is a minimax solution and a Bayes solution 
in the strict sense when 12 is replaced by 12,. Let {ft,} be a subsequence of the 
sequence (ft) such that {ft } admits a limit ft, i.e., lim ft,. = ft . Because of 

J-M 

Lemma 3.3, 

(3.72) lim inf r(F, ft,) S r(F, ft). 

j-M 

Hence ft is a minimax solution relative to the original space 12 and statement 
(l) is proved. 

To prove (ii), replace W{F, d) by W*(F, d) = W(F, d) — r(F, ft) where ft 
is a decision rule for which r(F, ft) is bounded. . In proving statement (iii) of 
Theorem 3.5, we have shown that there exists a decision rule ftj(i = 1,2, • • • , 
ad inf.) such that ft; is a mmimax solution and a Bayes solution in the strict sense 
when 12 is replaced by 12, and W(F, d) by W*{F, d). Clearly, ft,' remains to be a 
Bayes solution in the strict sense also relative to 12 and W{F, d). Since ft< is a 
minimax solution relative to 12, and W*{F, d), we have 

(3.73) r(F, ft,) ^ r(F, ft) for all F in 12, . 

Let [ft,,] be a convergent subsequence of {ft,'} and let lim ft ( , = ft . Then,, 
because of Lemma 3.3, we have 

r(F, ft) g r(F, ft) for all F in 12. 

Since ft is a limit of a sequence of Bayes solutions in the strict sense, statement 
(ii) is proved. 

Addition at proof reading. After this paper was sent to the printer the author 
found that 12 is always separable (in the sense of the convergence d efini tion in 
Condition 3.5) and, therefore, Condition 3.5 is unnecessary. A proof of the 
separability of 12 will appear in a forthcoming publication of the author. 

The boundedness of r(F, ft-) is not necessary for the validity of Lemma 3.3. 
Jim ft = ft and suppose that for some F, say Fq , r(Fo, ft) is not bounded 

in t. If lim mf r(F o, ft) = «, Lemma 3.3 obviously holds for F = Ft. If 



STATISTICAL DECISION FUNCTIONS 


205 


Wild r(Fo, fO = g < ®, let (t/j be a subsequence of [i] such that 

lim r(F 0 , f,,) = g. Since r(F 0 , fy) is a bounded function of j, Lemma 3.3 is 

applicable and we obtain g £ r(F 0 , f 0 ). In a similar way, one can see that also 
Lemma 2.4 remains valid without assuming the boundedness of r(F, in). 

Although not stated explicitly, several functions considered in this paper are 
assumed to be measurable with respect to certain additive classes of subsets. 
Injthe continuous case, for example, the precise measurability assumptions may 
be stated as follows: Let B be the class of all Borel subsets of the infinite di¬ 
mensional sample space M, Let H be the smallest additive class of subsets of 
fi which contains any subset of Q which is open in the sense of at least one of 
the convergence definitions considered in this paper. Let T be the smallest 
additive class of subsets of D which contains all open subsets of D (in the sense 
of the metric i(di, dj, fl)). By the symbolic product if X Twe mean the 
smallest additive class of subsets of the Cartesian product fix!) which con¬ 
tains the Cartesian product of any member of H by any member of T. The 
symbolic product H X B is similarly defined. It is assumed that: (1) W{F, d) 
is measurable (H X T)\ (2) p m (x l , ■ • • , x m | F) is measurable (B X #); (3) 
,.,'(0*) is measurable (B) for any member D* of T; (4) z r {x\ • * •, x r ) and 
c,(x‘, ■ • ■, x r ) are measurable (3). These assumptions are sufficient to insure 
the measurability (H) of r(F, f) for any 

REFERENCES 

[1] J v, Neumann and Oskar Mouganstein, Theory of Games and Eonomic Behavior, 
Princeton University Press, 1944 

]2] A. Wald, "Goneralizatoin of a theorem by v. Neumann concerning zero sum two-person 
games," Annals of Mathematics, Vol. 48 (April, 1945). 

[3] A. Wald, "Foundations of a general theory of sequential decision functions," Eco- 
nomelrica, Vol, 15 (October, 1947). 



THE MULTIPLICATIVE PROCESS 
By Richard Otter 1 ’ 2 
University of Notre Dame 

1. Introduction and summary, The multiplicative process is usually defined 
by the sequence of random variables X>, Xi, ■ • • whose distributions are 
specified as follows: P(X 0 = 1) = 1, XlT-a P(Xj = v) = 1, and if X n = 0 then 
P(X„ +1 = 0) = 1, whereas if X„ is a positive integer then X„+i is distributed 
as the sum of X„ independent random variables each with the distribution of X x . 
The variable X„ is interpreted as the number of “particles” in the nth generation, 
and the index n as a discrete time parameter. This has been the method of 
approach in previous studies of the process [1, 2, 3, 4, 5] The multiplicative 
process has various applications, notably in the study of population growth, the 
spiead of epidemics or rumors, and the nuclear chain reaction. The closely 
related “birth and death” process was recently studied by Kendall [6]. 

Whenever one studies the probability theory of a particular system there 
seem to be definite conceptual advantages in defining explicitly the set C S of 
elementary events, the additive class £01 of subsets of IT, called events, and the 
probability measure P for the events of 212. Now an elementary event of this 
process can be represented by a rooted tree where the original particle is repre¬ 
sented by the root vertex and where the particles of the nth generation are 
represented by the vertices n segments removed from the root. The tree will be 
finite or infinite accordmg to whether a finite or an infinite number of particles 
are involved in the elementary event. Thus, the set of trees is the natural 
choice for IT. The first part of this paper is devoted to a more precise description 
of 3, 2J2 and P. We shall then see easily that X n (t), the number of vertices n 
segments removed from the root of t e 7, i.e. the number of particles in the nth 
generation, has the distribution defined in the preceding paragraph, Since the 
time does not appear in our description of 3”we fetter ourselves somewhat if we 
interpret n as a discrete time parameter. Thus, we have already reaped some 
harvest from considering the process from the point of view of 3*. Another 
advantage is that we are led in a natural way to study the distribution of other 
structural features of the trees, e.g the total number of vertices, or the number of 
vertices with k outgoing segments. 

The chief results of this paper are as follows. The recursion formula for the 

probability P n that a tree have n vertices n =* 1, 2, • •■ is obtained as well as an 

asymptotic estimate of P„ valid for large n. The distributions of the number of 

branches at the root in a finite tree, an infinite tree, or in a tree with n vertices 

are obtained and the asymptotic distribution of the latter as n -*- m, The 
--- 

1 Research under an Office of Naval Research contract. 

! The author wishes to express his gratitude to Professor E. Artin of Princeton University 
for the suggestion of this problem and his encouragement towards its solution. 

208 



THE MULTIPLICATIVE PROCESS 


207 


distribution of the fraction of vertices with k outgoing segments in the finite 
trees, in the trees with n vertices, and the asymptotic distribution of the latter 
as n —* 00 are also found. Finally, an estimate is obtained for the probability 
that a tree be finite in case this probability is near 1, a result which was previously 
obtained by Jtolmogoroff [7]. 

2. The space of trees. We shall use the notation (a), {ai, a £ , a n }, 
{uj} i»j i and [a, | R)j t j to denote the sets which consist of respectively the single 
element a, the elements m , a 2 , - • ■ a n , all a, with j e J, and all a, with the 
property R and j e J . We denote the union of two sets A and B by A + B, their 
intersection by AB, and the cartesian product of n identical factors each of 
which is A by 

Let I denote the set of positive integers. We assume given for each »J a 
countable set U n of objects tt,,^.called vertices, i.e. 

Let u a be a vertex distinct from all the other vertices and let U - {u 0 ) -f- 2 U n 
be the collection of all the vertices. We shall interpret u 0 as the original parent 
particle and the vertex um , for example, as the second son of the fifth son of the 
first son of the original particle. If s is a subset of U, s C U, and if L , L, • • • i n+m 
are such that » • ■ ■ «<>»■ >» +n , each belong to s then 

this set of vertices is called a path from to m s and m > 0 is the 

length of the path or the distance from w n ,, , < n to u <iu ... in+n . If m = 1 we call 
the path a segment, for short. 

For the sake of convenience let us agree to put 0 = , (n > 1) 

then we define u), for u «s C U, to be the number of segments from u in s, 
and we call W(s, u) the type of the vertex u in s. If t is a subset of U, then we 
call t a tree if and only if 

(1) W(t,u) < <= for u et 
and 

(2) u,n t ..,i n «t implies ut,,-,. e l for v = 0, 1, 

Let 3" be the set of all trees. The condition (2) clearly implies that for each 
l e[f we have ito e L and that there is a unique path from Uq to any other vertex 
of t. Hence, whenever a path exists between any two vertices of t it isunique. 
We call no the root of t. If for u e l e Twe have Wit, u) = 0 then u is called an 
endpoint of t, and the vertices of t which are not endpoints are called inner 
vertices. (It is to be noted that the objects we call trees here are rooted trees 
in the sense of Cayley but our trees have their vertices numbered as well. 
Usually one would identify the trees [wo, Ui, wj, u u ) and , Ui, Ut, un\, 
but we do not wish to do so because for us it is distinctly different whether the 
grandson is sired by the first son or by the second son.) 

For it tt eiT we define the branch of t at u to be the set of all vertices belonging 



208 


BICHAUD OTTBB 


to any path from u in t. Our convention of admitting paths of length 0 implies 
that u £ b(t, u). In fact, if W(t, u) = 0 then b(t, u) = {it}. If t' is a tree such 
that t' C t then we call t an extension of l 1 , denoted t > t' or £' < t, if W (t r , u) >0 
implies W(t', u ) = W(t, u). Thus t > t' is equivalent to t 3 i' and 

i = t' + 2 b(t, u) 

« 

where u runs through all the endpoints of t r . The extension relation imposes a 
partial ordering upon T. 

The extension t of i' is interpreted as a possible future aspect of a family tree 
when its structure at present is given by V, all present members of the family 
who have progeny being regarded as sterile. 

If u = then the mapping tp defined for the vertices of b(t, u) by 

putting 

V{ u H<2 • ‘n‘n+i ••*n+m) = U 'n+l"‘<n+m 

maps b(t, u) one to one onto a tree <p{b(t, u)) in such a fashion that if (i>i, v t \ is a 
segment from Vi to v 2 m b(l, u) then [<p(y i), ip(v 2 ) j is a segment from <p(v t ) to 
<p(v 2 ) in <p(b(t, u )). We call the mapping <p a homeomorphism and wo say that 
b(t, u ) is homeomorphic to ip(b(i, u)). 

If a tree contains a finite number of vertices then it is called a finite tree; 
otherwise it is an infinite tree. Let 7 denote the set of all finite trees and 3 the 
set of all infinite trees, and let 3Ldenote the set of non-negative integers. For 
each k e 3C we define Y k (t) for i e IF to be the number of vertices of type k in t. 
When it is clear to which tree t we refer we shall usually abbreviate Y 0 (t) by m, 
and we agree not to use the letter m with any other connotation. For each 
T e 3 let ci(T), e 2 (T), ■ ■ • e m (T) denote its m endpoints. We then define for 
TeTand K - (fc, h , • ■ • K) t X Cm) 

IT, *] = [t 1 1 > T, W(t, e,(D) - ki, i = 1, 2, • ■ • m’ £ JT}, 

and we call [T, k\ a neighborhood. For each t e [3 1 , k] we say [T, k] is a neighborhood 
of i. Then it is easy to show that IT is a topological space where the neighborhoods 
defined above form the defining system of neighborhoods [8]. 

3. The measure theory in 7. In the following paragraphs an outline of the 
measure theory in 0" is given which omits proofs for the most part since they are 
easily constructed. The only point of difficulty arises in showing the measure 
function to be completely additive, but here the outline has more detail. 

Let © be the collection of subsets of J" such that 0 «<3 and any other set S 
belongs to © if and only if there is a t e 7 and a non-void “rectangle set” 
i = ii X ia X • ■ • A m C DC lm \ m = F 0 (i), such that 

< 3 ) s = £ [t, k] 

*• A 

where the sets A lf A t , ■■■ A m may be finite or infinite sets of non-negative 



TEE MULTIPLICATIVE PROCESS 


209 


integers. The collection of neighborhoods which appear as terms in (3), i.e. 

{ [t, k] 1 ua , we call an ©- •partition of S, and t is called the generator of the ©- 
partition. Only a finite number of ©-partitions are possible for an S e®, 
because only a finite number of trees can possibly be generators and there is 
only one ©-partition per generator. With respect to our partial ordering of the 
trees all possible generators lie between two particular ones. We call the 
smaller of these the irreducible generator and the corresponding ©-partition the 
irreducible ©- partition of S. Any partition of <S into neighborhoods must be a 
subpartition of this irreducible ©-partition The elements of © also display 
two important properties of the rectangles in Euclidean space, namely if S 
S' e © then 

(4) SS' <■ © 
and if S C S' then there is a finite chain 

(5) S = S 0 C Si C • • • S n = S' 
such that Si , £, — Si-i « © for i = 1, 2, • ■ • n. 

A class of sets with the properties (4) and (5) has been called a half-ring by von 
Neumann [9]. 

Let po, pi, • • • be given non-negative numbers such that p, = 1. For 
t e Tlet us put 

(6) m = n v Y / w 

with the convention 0° = 1. We then define the measure function P for the 
sets in © by 

P(0) - 0 

(7) P([f, k]) = (il P,^j ?(0> ^Imre x = (fo, h • • • k n ) e X ( ”° 

P(S) = X) P ([£, x]), where j (f, x] ) is the irreducible ©-partition of S. 

Mi 

P is evidently non-negative, Letting t be the tree with one vertex and putting 
A = X gives P(T) = 1. It is easy to see that P is completely additive for the 
©-partitions of a neighborhood, but this implies P is completely additive for the 
©-partitions of an arbitrary element S of ©. In order to show that P is com¬ 
pletely additive for any partition of S into elements of ©, it is necessary and 
sufficient to show this for an arbitraiy partition of a neighborhood into neighbor¬ 
hoods. One may reach finer and finer partitions of a given neighborhood N by 
replacing a neighborhood in any one partition by an ©-partition of the neighbor¬ 
hood, and repeating the process. The sum of the measures of the sets in the 
partition is invariant under such a replacement. On the other hand it can be 
shown that all possible partitions of N into neighborhoods may be reached in 
this way. More precisely, let iV - = (A r /} iu be a partition of a neighborhood N 



210 


RICHARD OTTER 


into neighborhoods N, We call N reduced if whenever a subset of N is an 
3-partition of a neighborhood M C N then the partition consists of M itself, i.e. 
it is the irreducible ©-partition of M. Then we have the following theorem: 

Theorem 1 If N is a reduced partition of a neighborhood N into neighborhoods 
then H = {iV}. 

The proof is indirect and proceeds by constructing a decreasing sequence of 
neighborhoods contained in N whose limit is not void and yet has nothing in 
common with any N,, but this is a contradiction. 

Let ft consist of all those sets which may be formed by finite unions of disjoint 
elements of a half-ring <3, then ft is a field of sets. If P is a completely additive 
measure on 3 then its natural extension Pi is completely additive on ft [9]. 
Kolmogoroff [10] has shown that the completely additive measure Pi may 
always be extended to a completely additive measure Pi on the Borel field 371, 
i.e. the smallest additive class of sets containing ft. Since P s (3~) = 1, Pi is a 
probability measure. For simplicity we put Pi = P. Let us also agree that if 
M is the set of all trees with the property R we may write P(R) instead of P(M). 
If TV is a set with P(N) > 0 then P(M/N ) shall denote the conditional probability 
of M, given N, i.e. P(M/N) = (P(N)T l P(MN). 

4. Independence of the branches. In the multiplicative process the events 
occurring in one branch of a tree are independent of those in a second branch 
disjoint with the first and it is for this reason that the process is relatively simple 
to analyze. In this section we shall try to expose the character of this 
independence. 

For T 1 7, let S t be the set of all extensions of T, then 

^ = E [T, k], 

("0 

whence by (6) and (7) P(S T ) = P(T). The following lemma is then easily 
established. 

Lemma 1. If P(S T ) > 0 then W(t, ei(T)), i = 1, 2, • • • m, under the condition 
i e 6> T , are independent-random variables each with the distribution, 

(8) P(W(t, e<(T)) = k/Sr) = p k k = 0, 1, 2, • • • . 

In the particular case where T = { 14 } we have S T = ST and we put 
W(t) = Wit, Wo) for short. Thus W{t) tells what type of vertex the root of t is 
and (8) becomes 

P(W = k) = p* k - 0, 1, 2, • • • , 

For t e 7 and n ~ 0, 1, 2, ■ • • let X n (t ) be the number of vertices of 
t at distance n from its root. Then X*(«) = 1 and Xi(t) = W(t). If n, r are posi¬ 
tive integers then there is at least one T e 7 which has r of its endpoints, say 

c,,(T), ei,(!T), ei r {T), at distance n from the root and which also satisfies 
X„+i{T) = 0. Put 

I W V> - 0, j * k , H , • ■ ■ i r , i t &.}, 



Evidently for 1 1 


THE MULTIPLICATIVE PROCESS 


211 


- L W{t, e„(T)), 

>-i 

and a proof similar to that of lemma 1 gives 
Lemma 2. If P(<Sr ir ) > 0 then X n+ i(t), under the condition l e tbp ' ir > 
is the sum. of r independent random variables each with the distribution of X,. 

By (6) and (7) for t « 011 C 7S T 

pa) - 

?-0 

which depends only upon the type of each vertex as it occurs in f. For those 
vertices which are inner vertices of T, Y,(t) is constant. Any other vertex 
belongs to one and only one of b(t, e x (T)), b(t, e i (7 1 )), • ■ • 6(1, e m (T)) and itB 
type in 1 is, of course, the same as its type in the branch to which it belongs. 
Furthermore, each branch is homeomorphic to just one tree in 7, 

6(1, ei(T )) <-»!(, i = 1, 2, «• ■ m. 

Since the type of a vertex is preserved under homeomorphism we have 

Pit) - P(&T)P(k)P(t<) P(t n ). 

If, as 1 runs through Oil, (t\, k , • • • 1») runs through 9lli X 9K 2 X • • * 9K*. , 
we obtain 


(10) Pm) « P(fir)F(9Ki)F(9K,) • • ■ P(9F„). 

Let us hereafter put p = P(7). In the particular case of (10) where 911 =* [FS r 
we clearly have 911,- = 7,1 = 1,2, • ■ ■ m, hence 

(11) F(7£r) - F(St)'p”. 


If we define F,, 
W{T,) = v then 

( 12 ) 


0, 1, 2, • • • , to be the tree with v -f 1 vertices which has 

op 

7 = (Uo) + S Sr,, 

*"■1 

oo 

7 = ("Uo) + 2 TSr,, 


where 


drt&Tj = ( kc ) 0 , 
P(<§r,) = p,, 
From (11) and (12) we get 


•> * 1, 2, ••• . 



= 7 >. 


(13) 



212 


RICKARD OTTER 


For 1 1 3S r let Z(b(l, e,(T))) be the number of vertices in. the branch of t at 
ei(T). In the particular case where T = {uc,} we have 6(f, uf) ~ t and Z(t) 
is the number of vertices of t If now 


ft = (t [Z(t) = n,t <■(?]; 1,2 ,...; 

P» - P(ft), 

then by putting 911 a SR? 1 '""" where 


(14) 


{< I Z(b(t, e>(T))) = n,, 
= ft,, i = 1, 2, • • ■ m, 
we may apply (10), which gives 
(15) P(t € fFS r , Z(b(l, ei (T))) = n., < = 1, 2, 


i » I, 2, m, tef/gj.), 


»i) 


P(.^r)Pn l Pn 1 - P„ R , 

H p > 0 we may multiply and divide the right hand member of (15) bv v m 
which leads us to the following lemma: 

JrW Wt) ^ °’ thm ZiHt ’ e ' (T))): * - 2, ... m, under the 

condition 1 «Jfef, are m independent random variables each with the distribution 
of d{t), given 1 1 If. 


° ,m uw be * he •»> *• di,. 


(16) 


f(w) = E p, 


w 


n - » m studying the M9 », Me 

u’ “T, V ’ then 0ne shou d define another sequence of functions U f, • • • 
where /o ( ui) - » and /.«(«) = /(f.(*)) for n = 0, 1, 2, • By competing 
rmally the expansion of /„(«) around w = 0 it is not difficult to show that 

SM i8 the generatin S functi ° a *. , i.e. /» - ± P( Xn _ „) w - which is 

Su.tf n ! P ?;w Jr thep , revi ° Us “ligations of the multiplicative process 

fo 1 r 8 a r 6 mamly mterested ® the distribution of Z wo define 3>( z j 
to be the corresponding generating function, i,e. ^ 


(17) 


9*W 


n—X 


of (16) and (17) respectivliy^lhce/at ^lTd^aT^'we W 
Theorem 2. Let 

®(*, w) =» z/(u>) — ID, 



THE MULTIPLICATIVE PROCESS 


213 


then w = 9(z) is the unique analytic solution of 
(18) @(z, w) = 0 

in a certain neighborhood of (0, 0), 

Proof. Since 9 (z) is analytic at 0 and 9(0) = 0 it suffices to show that if we 
substitute formally ^P n Z n for w in z X, p v vf the coefficient of z n is uniquely 
determined and is P „. 


/ » 


(19) 2 X Pr(9(«)Y = PoZ + J2 (X X 

n““2 X*'"! In k «n-1 


Pfl! Pr > 3 ' 


x)*”- 


If in (14) we put T = T, , where T, was defined just before (12), then 
m = Y 0 (T V ) = v. Let us require in addition that the total number of vertices 
in the branches be n — 1, i.e. n x + rh + • • ■ n, = n — 1, then 


( 20 ) 


(F n « X X 9 k- 1 


»—1 2 n<—n—I 


T, 


n = 2, 3, 


where 




unless i — j and ?ii <= mi, nt * m , • • * n,- = m/. By applying P to (20) and 
using (15) we get the coefficient of Z n in (19) for n > 2. This together with the 
obvious fact that Pi = p 0 completes the proof. 

It is worthwhile noticing that by means of the formula of Burman and La¬ 
grange [11] we can solve the recursion formula for P n in terms of po , pi, • ■ • , 
namely 


( 21 ) 



X 

2; i*/— n—1 


(rc - 1) 1 

Vo 1 vi 1 • ■ • 


vVp’P- 


Now if t has n vertices we know from Euler’s characteristic that 
XjT,( 0 = 7i — 1. Since P(t) = we see from (21) that 

(n — 1)1 . 1 

voU>i \ • • • ’ 2 ^“ n > s 3»i - n 1, 

is the number of trees in 3v for which Yo(t) = , Fi(<) = Vi , • ■ ■ . 

Evidently v) = E?(z) remains a solution of (18) for all z such that | z ( < a, 
| v) | < p. In case po **> 0 the constant 0 solves (18). Hence £?(z) = 0 for all 
z and so £P(1) =» p » 0. Conversely, if p « 0 then Pi = po => 0 which gives 
Corollary 1. p « 0 if and only if po — 0. 

Since we wish to investigate the distribution in 9 we shall henceforth assume 
Po ^ 0. 

Any non-constant function g(z) which has a power series development pos¬ 
sessing non-negative coefficients g(z) = X a ’ z > > 0 with a positive radius of 
convergence R has two properties that are important for us: 


(22) g(z) has a singularity at R. 



214 


RICHARD OTTER 


(23) If 2 a > R " converges then £ aA converges absolutely and uniformly 

for | z 0 1 = R, and so the series defines a continuous function g(z) there. We 
have , g(z) = 2 <Wo as l° n 4 as the path of approach to zq lies in | z | < 
R. On the other hand, if as z approaches R through real values below ll> 
z —> R—, the limit of g{z) exists then YL a,R r converges. So if we put g(R) ~ 
ff( 2 ) = S a <R t then the meaning is unique even allowing » usa value. 
Returning to £P( z ), if for | z | < a we have | V) | < p where 1 0 — iPfjt), then 


(24) 



c r\v>), 


which shows the mapping is schlicht in such a domain and that the image domain 
cannot contain zeros of /(id). Because of (23) and the fact that £?(1) is finite 
even if a = 1 we see that the mapping is certainly one to one for | z | < I. 
Corollary 2. p is the smallest rool of f(w) = w in 0 < < 1. 

Proof. (13) shows p is a root in the interval. If for 0 < < p we have 

f(w o) = U) 0 then by (24) fP~ 1 (ii)o) = 1. 

The following eorollaiy is the well known criterion for extinction 
Corollary 3. p = 1 if and only if f (1) < 1, 

Proof p — 1, > 0, and the convexity of /(to) in 0 < v) < 1 guarantee 

that (/(w) ~ 1)/(to - 1) is bounded by 1 and is monotonie increasing with to. 
Hence/'(l) exists and is < 1. 

Conversely, if/'(l) < 1 then either/'(w) is constant (= p, < 1) in 0 < w < 1 
or else it is strictly increasing with to and in either case /'(to) < 1. The 
mean value theorem gives/(to) > to in 0 < to < 1, hence p = 1. 

Putting a = -7{a) rve have the following lemma: 

Lemma 4. a < p 

Proof, We already know that fP(z) has a unique analytic inverse given by 

(24) for | fi'(z) | < p, but on the other hand 5”(z) ^ 0 for 0 < z < ct so this 
inverse is analytic for 0 < w < a. If we had o > p we could continue /(to) 
analytically by means of (24) along the real axis past its singularity at p, but 
this is impossible. 

Corollary, p = 1 if and only if a > 1. 

Proof. The necessity follows from the monotone behavior of £P(z) for 
0 < z < «. Conversely, if a > 1 then z = ^(I) => 1. 

Theorem 3. If p 0 + Pi ^ 1, then 

(25) <x and a are finite] 

(26) /(a) = a/a; 


(27) /'(a) < 1/a where the strict inequality can hold only if a = p. 

Proof. Let r > 2 be such that p r M 0, then for 0 < z < a, we get from the 



THE MULTIPLICATIVE PROCESS 


215 


functional equation 

zp r (9(z)Y - 9(z) < 0; 

/ l 

0 < 9(z) < (-L ) 

\W 

By lotting 2 -*■ a- we see a is finite and 9(z) is bounded. Since 9(z) is mono- 
tonic in this region we get a < «. By lotting 2 a in 8 ( 2 , 9 ( 2 )) we get (26). 
For 0 < z < a, 8 ‘u>( z j 9(z)) = zf , (fP(z)) — 1 is continuous and monotonic in¬ 
creasing with z and is < 0 for z near 0. From the general theorem on implicit 
functions we know @„(z, ,9(2)) 5 * 0 for | z | < a, so if we let 2 a we obtam (27). 

If ® ~ P (27) merely guarantees the finiteness of f(p) and gives an upper bound. 
One con easily construct an example where 1/a is the least upper bound and 
one where it is not. 

But if a < p then since 8 ( 2 , w) is analytic at (a, 0 ) and @(a, a) = 0 we obtain 
from the implicit function theorem the strict equality in (27). 

Corollary. If a - I then a = p <= 1. 

Proof. By (26) 

(28) /(«) - a - 9(1) = p < 1 . 

If a < p then f(p) = 1 so p = 1 from the convexity of /(w). If a = p then 
a > 1 which when combined with (28) gives a = 1 , 

The case where po + pi = 1 escapes Theorem 3 but it is easily examined 
separately, namely 

f(w) = po + piw, po ^ 0 , 

m - Lpopr 1 ^ = 

n -1 1 — Pi Z 

Hence p = 1, a = 1/pi and a = p = ». 

For the practical applications of the theory it is valuable to know some 
conditions which guarantee a < p, and thus strict equality in (27). From the 
foregoing analysis it is evident that one such condition is p = i.e. f(w) is an 
entire function, and another is /'(1) > 1. If one has enough information about 
f(w) to plot its graph for real positive w then the line through the origin tangent 
to f(w) in the first quadrant touches the curve at the point (a, a/a) from which 
we determine both a and a. 

6 . Asymptotic properties of the distributions. If we examine the terms of 
the sequence po, pi, • • • we may find that the indices of the non-zero terms are 
all multiples of some common integer larger than 1. In this case we should 
expect to have P n = 0 with the same sort of regularity. So let us define 3 to 
be the largest integer such that p, ^ 0 implies v is a multiple of q. Clearly we 
have q > 1 and 3=1 means there is no integer other than 1 which divides the 
indices of all the non-zero p,. Of course, pi ^ 0 implies 3=1 The following 
theorem establishes an asymptotic estimate for P n valid for large n, provided 



216 


RICHARD OTTER 


n - 1 is a multiple of q, and incidentally shows that P„ ® 0, if n - 1 ia not a 
multiple of q, 

Theorem 4. If a < p then 


, Hi +0( - a " n ~" > ' n " 1 (mod 5) ' 


(29) Pn 


i.e. for large n = 1 (mod 3 ) 


, n yd 1 (mod j), 




Proof. Let us put 0 = 2-n-/g, then for | w | < a, 

\f(w) | = f^PktW^ < X) VkQ 1 w |* 9 = /(| w |), 

and the equality evidently holds if and only if arg w is an integral multiple of 0. 
Furthermore, if w is such that | f(w) | = f(\w j) and we put z => 9 1 [ (w) then 
w = &(w/f(w)) so we get 

hence P n = 0, if n & 1 (mod q). 

For | z | - a and w - 9(z) the point ( 2 , w) satisfies (18) by (23). If we put 

_ 

z y — ae , 

w v — ae v \ v = 0, 1, • ■ • 3 — 1, 

then w, = 9(z,) and 

§ w {z,, w r ) = z,f{w r ) - 1 = af(a) — 1 = 0, 

so that Zo, 21 , ■ • • 2 ,_i are certainly singularities of ^( 2 ). But /(to) is analytic 
at w, and f{w,) = aja ?£ 0, so the solution of (18) for z, 

is analytic at w r . Furthermore 


~ 9~\wf) = = 0 , 

dw f(w,) 


i, £T‘w = „ -in?) * 0 , 

dW f(w,) w, 

which shows that £P(?) has a branch point of order 1 at each z,, i.e. 9\z) is an 
analytic function of (2 - z,) m in the neighborhood of ( 2 ,,«,), v « 0,1, • < ■ q - 1. 
For | ? ) = a, w — £P( 2 ) but 2 ^ 2 , we obtain 

| @„( 2 , w) | > 1 - d\f(w) | > 1 - «f(| w |) > 1 - af(a) = 0, 



THE MULTIPLICATIVE PROCESS 


217 


hence ff’(z) is an analytic function of 2 in a certain neighborhood of such a 
pair (z, 10 ). 

By analytic continuation we find a circle of radius 0 > a such that 9(z) is an 
analytic function of (z — z,) 111 for | z | < 0. if we make radial cuts in this 
circle running outward from each z, then in the resulting domain D each of the 
functions (z — z,) 1/2 is an analytic function of z hence so is £?(«). 

Let I 1 be the path consisting of the boundary of D oriented in the posi¬ 
tive sense, let y be the part of T lying in the sector —v/q < arg z < irfq, 
and let y' be that part of y leading from 0 to a along the lower lip of the cut at 
a, thence along the upper lip back to 0. Since ff’(z) satisfies the relation 
= e"*£P(z) for v — 0, 1, * * ■ q — 1, we see from Cauchy’s formula that 






dz, 


where 


A = 2 <f = 0, 
»-»0 

= 


n ^ 1 (mod q ); 
n = 1 (mod q). 


Restricting ourselves to 71 ss 1 (mod q) we put 

£P(z) - a -h b(z - a) m + c(z -«)+(*- a) w 2(z), 
where S(z) is analytic in D. Then P n ~ B + C, where 
n _ Q. f a b(z — a)* + c(z — a) 

B -2hS, -5®-*• 

2irt J-, z n+I 


We find 


B - 2 ^./ r ~~ dz + 0 (in = MVa(-l) n a-" + 0{fT*)\ 

1 ■ 0 1 - 0 (/,. 1148 ') - 0 (I L ^ I) ” 0 (■’ I C» 2 ) I) ■ 


The constant b is determined from the equations 
w — a = fi(z — a) 1,a + • • 


a 


-iZW („ - „)• + 


2a 


Using the fact that 

| (^ 2 ) | = (W)' 1,! + 0(n~ 6/ \ 

|( 3 f)| - 

we finally obtain (29) as desired. 



218 


RICHARD OTTER 


Thus P n approaches zero a little faster than exponentially with » regardless 
of whether p — 1 or p < 1 , except for the special ease when a — 1 . In this 
case it is interesting that, according to the corollary to lemma -t, p = 1 . 

The case where q ^ 1 is of no practical importance since one can always bring 
q back to 1 by making a very small decrease in one of the non-zero p. and in¬ 
creasing pi by the same amount. This can clearly be done so that none of the 
important characteristics of /( w) is changed appreciably. 


7. The limiting distributions of W(t ) and 7i l Yi.(l) for l e T'„ . Let us mo¬ 
mentarily drop the condition p 0 y 5 0. The charaeteristie function of IF is 

(30) fe aw dP = /(e”), 

so that for the rth moment of W we have 


(31) 




0-0 


r = 0, 1, 2, 


For the first and second moments wc obtain 


e ( w ) =m 

e ( w 2 ) = /'(i)+rd), 

which shows that the criterion for extinction (Corollary 3 to Theorem 2) may be 
stated as follows: the multiplicative process is almost certain to expire if and 
only if E(W) < 1, From (30) we see that all the moments of IF will be finite 
as soon as p > 1; but if p = 1 no general statement can be made, except incase 
a = 1 also, for indeed a = 1 implies ct = 1 so by (31) and (27) E{W) » /'(1) < 1. 

We now reassumo p 0 ^ 0. Since the variables Z, Y 0 , Y L , • • • are restricted to 
/ e (Fit is convenient to see what happens to IF in f T. If we define g(w) = p~ l f(pw) 
then (13) shows g(w) and g(e <s ) are the generating function and characteristic 
function respectively for W, given t e 7. Thus we see immediately that the 
first moment of W, given 7, is always < 1, and all its moments are finite if p < 1. 
In case 0 < p < 1 we may also introduce h(w ) defined by 

(32) f(w) = pg(w) + (1 — p)h(w), 

then h(w) is obviously the generating function of TF, given 3. Hero the rth 
moment is finite whenever the rth moment of W is finite. (32) gives 

P(W = k/3) = p k k „ i i2 ,.... 

It would be interesting to be able to compare this with the corresponding thing 
for large finite trees and in this connection we have the following theorem: 
Theorem 5. If a < p and q = 1, 

lim P(W = k/7„) = akpta! 1 - 1 , k - 1, 2, - • • . 

n—*oo ’ ' 



THE MULTIPLICATIVE PROCESS 


219 


Proof. By expanding zf(e' l 9 , (z)) in powers of 2 we obtain 
(33) zfic^iz)) = f 

n~»l 

where 


MD) 


f , T, e' ,, Pt P ni P„ t ...p nrl 


so that if P» 5 ^ 0 then P*<j> n (6) is the characteristic function of W, given fF„ . 
From (33) we get 

2m Jr 2 “ 

Since a < p we may expand /(e^fz)) about the point £P(z) = a and integrate 
as in the proof of theorem 4, thus 

P?M9) - ~ + *,(«). 

Since e„(0) —»0 as » —> «>, 

lim P;V„(9) = cxe'Yiae' 9 ), 

n-**> 

the limit function obviously being the characteristic function for the distribution 
whoso generating function is awf(aw), from which the theorem follows directly. 
Now 9(st)/p is the generating function for Z, given and the function solves 

(34) zg( w) — to =* 0 

for I 2 | < a. We find for the rth moment of Z 


Ff' 7 ' ,e F) - d9( - e “ } 

m/J] ~m r ~ 


V 


M 


r - 0, 1 , 


hence all the moments are finite as soon as a > 1. Since by (34) 

dw _ g{w) _ w 

dz 


1 - zg'{w) e(l - zg'{w))’ 
we obtain for the first moment 

***> - nr" htto ■ 


%t) 

w ■ . 

P 


1 -1'ivY 

In a similar way one can express any moment of Z, provided it is finite, in terms 
of /'(p), /"(p), etc. If a <=> 1 we see from the corollary to theorem 3 that even 
the first moment of Z is infinite, except for the special ease where p => 1 and 

f(l) < 1 . 

The characteristic function of Ft, given V, is 



220 


RICHARD OTTER 


where by (21) 

M) = f dP = 2 e' ,lt p 0 ,B Pi M . • •. 

Thus, if P n ^ 0, PnV*n(^) is the characteristic function oF F*, given ff„, If 
pis = 0 then ^*(0) = 1. If p* 5* 0 put p* = e e * then 

fl(r) » 

(35) 7- r m) = 0)*T, 

^®b n-l 

hence 


igsPW-fiCnviri, 

which shows that all moments of F* are finite if a > 1. Let us put to 
for short, then, by (18), 


(36) 


dw 

dqk 


_ ZpkV) 


1 - zf'(w) 
which gives for the first moment of Y k , 

m*/T) 




= 

i -/to 


z l p k w l ~ l 9’(z), 

p k p H ~ l E(z/:n, 


STM, 


which is to be expected since pup* 1 " 1 plays the same role in IF that p* plays in IT, 
We may also expect that for l e c J n , n 1 Y k should be closely related to p* , This 
question is settled by the following theorem: 

Theorem 6. If a < p and 5 = 1 then for x real 


lim P(n~ l Y k < x/%) « j ~ a P^~ K > 

1 0, if X < ap k a k ~ l . 

Proof. We intend to estimate the rth moment of n~ l Y k for t e % and n 
very large from (35) by means of the contour integral 


(37) 

So let us put 


E(n r Y r k /X) 


1 _ 

2irin r P„ 




w-$(z), «,<*> ~ r,s -0,1. , 

then by (36) «, - ^V 11 and by Leibnitz formula, provided k * 0, 


(38) 


W r Z S p k 


2 ( r " x ) 1 

r—1 Poll'll ••• pil 

■. *> n 


w, y w, x 


*>n-v 


w 


w 
> k • 



THU MULTIPLICATIVE PROCESS 


221 


The principal contribution to the integral in (37) will come from the term of 

(38) which has the largest size for z near a. If we put f = (z - a) 112 then w is 

regular at f = 0 and so is the constant p* . Let’s assume that w, has a pole of 
order 2v — 1 at f = 0 for v = 1, 2, • - • r — 1, which is clearly true for v = 1. 
Then if s is the number of v\, , • • • n„i which are = 0, the order of the pole 

of the general term of (38) at f = 0 is 

k —1 

£ (2v< ~ 1) + s + 2v k + 1 = 2(r - *,) - (k - s), 

i»l 

which has the maximum value 2r — 1 if and only if y 0 = ^ = • ■ • d h = 0, 
Vh = r — 1. Hence 

(39) Wr = eV~V“i + f Mr yti(f), 

where $.(f) is a regular function of f at zero. For k = 0 the formula (38) is not 
correct but it is easy to see directly that (39) is correct for k — 0, If we derive 
(39) with respect to z and put r - 1 for r we obtain 

w& = z'pA® + r Mr an(f), 


hence 

Wr = (*v i )V r) + r 2r ^(n. 


Substituting in (37) and estimating in a manner similar to that employed 
previously wo obtain 


E{n’Yl/X) 


ivj^y r ^(z) d r (z - a y- r u(z - «) m ) dz 

2rin r p„ Jr z"- 2r+1 ’ r Jr 2tt in r p K z nH 


„i-i\r F„-r (n r- r)(n - r - 1) • • • (» - 2r + 1) 

( Pfc a ) -- - - 


+ 0{n v \ 


and finally 


lim Ein'Yl/%) = («p*o w )'- 


The limit of the rth moment is itself the rth moment of the distribution on the 
real line which has all its mass at the point apkd k ~'. Since this distribution is 
uniquely determined by its moments, a well known theorem [7] enables us to 
conclude that our sequence of distributions has this distribution as limit and 
this is equivalent to what is claimed by the theorem. 

It is important to notice that if we put the mass apto ,at the point k this 
determines a distribution on the real line because of (26). 


8. The estimation of p. If we wish to estimate p when we know p ^ 0, we 
may obtain an estimate from the knowledge of f(w) in 0 < w < 1, using the 
method of iteration. That is we choose a function G(w) such that G(p) = p 
and | G(w) - p | < | « — p | for 0 < w <1. Then if for any w 0 m the open 



222 


RICHARD OTTER 


interval we compute successively Wj, m , , where te n+ i = G(w n ) for n ~> 0 

we aie sure tliat w n converges exponentially to p as n —» oe, 

Obviously f(w) itself has the properties of G{w) but we achieve faster con¬ 
vergence towards p using Newton’s method, that is if we put 


(40) 


fl(w) = /( W) - lLI, 


G(w) — w — 


/l(w) 


If for some reason we expect p to be close to 1 then it is better to put 

and use fi(w) in (40) instead of fi(w), for then we may choose w 0 =» 1. 
Let us put/'(l) = 1 + e, e > 0 then 

Ml + h) = f SL±*LzJ _ 1 -* «, h —» 0 ; 

Mi + k) = Hm + *>.- 1 - fSL ±R ~ 1 ) 

t-.ii \ h(h -j- k) kh ) ’ 

w 


MD 


lim / /(l + W - 2/(1 + A) + 1 \ 
t-o \ W ) 


rci) 

2 


Hence 


(41) 


p » Wi == 1 


2c 

/"W 


This result was previously established by Kolmogoroff [7J. 

The following two simple examples display the results of the general theory. 
Example 1. We take f(w) = po + PiW -)- p^M where Pv + pi + pi «■ 1 
and po, p 2 > 0. We have p = oo. From the equations (20) and (27), 


/(a) = po + pio -(- p 2 a 


a 

t 

a 


S\o) = Pi + 2p*a « i p 

a 


we obtain easily 


a MpopP, « 1 *■ Pi + 2\/p 0 p a , 


and it is evident that a > 1 is equivalent to p 0 > p, is equivalent to /'(l) « 
Pi t- *Pz < X. -Now * ' 7 


@(2, ui) — sp 0 -f- (zp[ — l)u) -|- zpaui 2 , 



TUB MULTIPLICATIVE PROCESS 


223 


hence 

(42) 


‘?(z) = 1 s Pi ~ V (1 ~ spi ) 2 - 4z 2 pop a 
2 zp 2 


the choice of the sign of the nulled being determined by letting z —> 0 , 


Pa 4 - Pi - V (p 0 - yi fl )a _ 
2 go 2 


_ -i 

P0P2 , 


Po > Pi; 
Po < ps. 


In the case jh > 0 ve have 5 = 1 and then, by (21) 

P"= L ? (i V : T ^T pS'pI'pS 2 , 

»l+2»j"n-l 

which can also be obtained by expansion of ( 12 ) according to powers of z. From 
(29) vc get 

P" ~ jjj; (PiV pop! 3 + 2p 0 pi 1 ) (pi + 2 a/ popj)" rf m . 

In the case p t = 0 we have q = 2 and obtain from (42) or from (29) 

Pto - g(~ D^^^pSpr'i 

_ V (2r - 2)1 

Di ?oPa 




which shows 


P, - 


0 

( 2 r - 2 )! , ! 

P0P2 1 


H(r - 1)! 

By direct use of Stirling’s formula or from (29) we get 


n —2 v; 
n = 2 r — 1 . 


Pi ,-1 ~ - 4 /? 2 2 " 1 (p 0 p 2 ) v ( 2 r - l) 3 ' 2 . 

Pi V ir 


ItxAMPLE 2 , We take/(w) = A > 0 . so that W has a Poisson distribu¬ 

tion. Then p = 00,5 = 1 , and we get from (26) and (27) 


Ma-1) 


J{a) = e Ma ~ u = a/a, 


/’(a) = = 1 /a, 

a = 1/A, 


X-l /. 

a = e /a. 


Clearly wc have a > 1 if and only if A < 1 and in this case 1 is evidently the 
only solution for w of / t “ _1> = w, hence p = 1. On the other hand if A < 1 



224 


RICHARD OTTER 


then (41) gives p = 1 — 2(\ — 1)X~ 2 . By (21) we get 

_ W 1 -nx 


and by direct use of Stirling’s formula or from (29) we get 




^rV ' 2 


REFERENCES 

[1] R A. Fisher, The Genelical Theory of Natural Selection, Oxford, The Clarendon Press 

1030. 

[2] A J Lorn, Theork Analylique ties Associations Biologiques 2, Hermann and Co., 

Pans, 1939, 

[3] D. Hawkins and S. Ulam, Theory of Multiplicative Processes I, MDDC-287,1944. 

[4] T, E, Harms, “Some theorems on the BornoulHnn multiplicative process,” thesis, 

doctor of philosophy, Princeton University, 1947. 

[5] A M Yaglom, "Certain limit theorems of the theory of branching random processes,” 

Doklady Akad Nauk. SSSR{N. S.) M 56 (1947), 795-798. 

[61 D. G, Kendall, “On the generalized “birth-and-dcath” process,” Annals of Maih, 
Stat,, Vol. 19 (1948). 

[7| A. Kolmogoroff, “Zur Losucg einer biologischen Aufgabe,” Mill . Porsche Inst, Math, 
u, Mech Unw, Tomsk, Vol. 2 (1938), pp, 1-6, 

[8] L, Pontrjagin, Topological Groups, Princeton Umv, Press, 1946. 

[91 J, von Neumann, Functional Operators, mimeographed notes, Institute for Advanced 
Study, 1933-35 

[10] A Kolmogoroff, Grundbcrgriffc der Wahrscheinlichheitsrcchnung, Chelsea Publishing 

Co,, New York, 1946. 

[11] A. Hurwitz and R, Courant, Funklioneniheorie, Springer, Berlin, 1929, 

[12J M G. Kendall, The Advanced Theory of Statistics, Vol. I, Griffin Co,, London, 1943. 


i 



APPLICATION OF THE RADON-NIKODYM THEOREM TO THE 
THEORY OF SUFFICIENT STATISTICS 1 

By Paul R, Halmos 2 and L. J. Savage 
University of Chicago 

Summary. The body of this paper is written in terms of very general and 
abstract ideas which have been popular in pure mathematical work on the theory 
of probability for the last two or three decades, It seems to us that these ideas, 
so fruitful in pure mathematics, have something to contribute to mathematical 
statistics also, and this paper is an attempt to illustrate the sort of contribution 
wo have in mind. The purpose of generality here is not to solve immediate 
practical problems, but rather to capture the logical essence of an important 
concept (sufficient statistic), and in particular to disentangle that concept from 
such ideas as Euclidean space, dimensionality, partial differentiation, and the 
distinction between continuous and discrete distributions, which seem to us 
extraneous. 

In accordance with these principles the center of the stage is occupied by a 
completely abstract sample space—that is a set X of objects x, to be thought 
of as possible outcomes of an experimental program, distributed according to an 
unknown one of a certain set of probability measures. Perhaps the most familiar 
concrete example in statistics is the one in which X is n dimensional Cartesian 
space, the points of which represent n independent observations of a normally 
distributed random variable with unknown parameters, and in which the 
probability measures considered are those induced by the various common 
normal .distributions of the individual observations. 

A statistic is defined, as usual, to be a function T of the outcome, whose 
values, however, are not necessarily real numbers but may themselves be abstract 
entities. Thus, in the concrete example, the entire set of n observations, or, 
less trivially, the sequence of all sample moments about the origin are statistics 
with values in an n dimensional and in an infinite dimensional space respectively. 
Another illuminating and very general example of a statistic may be obtained as 
follows. Suppose that the outcomes of two not necessarily statistically inde¬ 
pendent programs are thought of as one united outcome—then the outcome T 
of the first program alone is a statistic relative to the united program. A 
technical measure theoretic result, known as the Radon-Nikodym theorem, is 
important in the study of statistics such as T. It is, for example, essential 
to the very definition of the basic concept of conditional probability of a subset 
E of X given a value y of T. 

The statistic T is called sufficient for the given set 501 of probability measures 

1 This paper was the basis of a lecture delivered upon invitation of the Institute at the 
meeting in Chicago on December 30, 1947. 

* Fellow of the John Simon. Guggenheim Memorial Foundation. 

225 



226 


PAUL R. HALM OS ANH L. J. SAVAGE 


if (somewhat loosely speaking) the conditional probalnlity of a subset F of X 
given a value y of T is the same for every probalnlity measure in SR. It is, for 
instance, well lcnown that the sample mean and variance together form a sufficient, 
statistic for the measures described in the concrete example. 

The theory of sufficiency is in an especially satisfactory state for tlu; case 
in which the set SOI of probability measures satisfies a corlain condition described 
by the technical term dominated. A set SW of probability measures is called 
dominated if each measure in the set may be expressed as the indefinite integral 
of a density function with respect to a fixed measure which is not itself necessarily 
in the set It is easy to verify that, both classical extremes, commonly referred 
to as the discrete and continuous eases, are dominated. 

One possible formulation of the principal result concerning sufficiency for 
dominated sets is a direct generalization to the abstract case of the well known 
Fisher-Neyman result: T is sufficient if and only if the densities can be written as 
products of two factors, the first of which depends on the outcome through T 
only and the second of which is independent of the unknown measure. Another 
way of phrasing this result is to say that T is sufficient if and only if the likelihood 
ratio of every pair of measures in depends on the outcome through T only. 
The latter formulation makes sense even in the not necessarily dominated cose 
but unfortunately it is not true in that case. The situation can be patched up 
somewhat by introducing a weaker notion called pairwise sufficiency. 

In ordinary statistical parlance one often speaks of a statistic sufficient 
for some of several parameters. The abstract results mentioned above can 
undoubtedly be extended to treat this concept. 

1. Basic definitions and notations. A measurable space (A, S) is a set X 
and a <r-algebra S of subsets of A.’ If (A, S) and (V, T) arc measurable 
spaces and if T is a transformation from A into Y (or, in other words, if T 
is a function with domain X and range in Y), then T is measurable if, for every F 
in T, T : (F) tS. If Y is a Borel set in a finite dimensional Euclidean space, 
then we shall always understand that, T is the class of all Borel subsets of Y, 
and the measurability of a function / from X to Y will be expressed by the 
notation /(e) S. 

Throughout most of what follows it will be assumed that (A, S) and (T, T) 
are fixed measurable spaces and that T is a measurable transformation (also 
called a statistic) from A onto Y. A helpful example to keep in mind is the 
Cartesian plane in the role of A, its horizontal coordinate axis in the role of Y, 
and perpendicular projection from A onto Y in the role of T. 

The following notations will be used. If g is a point function on Y (with 
arbitrary range), then gT is the point function on A defined by gT(x) = g(T(x)). 
If fi is a set function (with arbitrary range) on S, then iiT~ 1 is the set function 

3 A <r-algebra is a non empty class S of sets, closed under the formation of complements 
and countable unions. If (X, S) is a measurable space, the sets of S will be called the 
measurable sets of X 



SUFFICIENT STATISTICS 


227 


on T defined by uT '(F) - fi(T~\F)). The class of all sets of the form T'\F), 
with F e T, will be denoted by T“ J (T), the characteristic function of a set A 
(in any space) will be denoted by x* • 

Lemma 1. If g is any function on Y and A is any set in the range of g, then 

[x‘.gT(x) e.l} = T-\\y. g (y) tA})-, 

hence, in particular, \r~un =» x yT for every subset F of F. 4 

Phoof. The following statements are mutually equivalent: (a) ,r 0 « 
(x: gT(x)eA\, (b) g(T(x a )) e A, (c) if ?/ 0 = T(x 0 ), then g(y a ) eA, and (d) 
T(x o) 6 \y: g(y) *A|. The equivalence of the first and last ones of these 
statements is exactly the assertion of the lemma. 

We shall have frequent occasion to deal with functions on X which are induced 
by measurable functions on F; the following result is a useful and direct structural 
characterization of such functions. 

Lemma 2 . If f is a real valued function on X, then a necessary and sufficient 
condition that there exist a measurable function g on Y such that f = gT is that 
f (e) T~'(T)\ if such a function g exists, then it is unique, 5 

Phoof. The necessity of the condition is clear. To piove sufficiency, 
suppose that / (<) T~ l (T), y a « F, and write X a = r -1 ({i/o})- Suppose x 0 e Z 0 
and write E ■= [x:f(x) ® f(x o)}. Since/ (e) T~ l (T), there is a set F in T such 
that E = T~%F). Since x 0 tE , it follows that ya t F and therefore that 

Ao - r\{y ,}) C T~\F) - E. 

In other words/ is a constant on X 0 and consequently the equation g(y 0 ) = f(x 0 ) 
unambiguously defines a function g on F. The facts that f — gT and that g is 
measurable are clear; the uniqueness of g follows from the fact that T maps 
X onto F. 

2. Measures and their derivatives. A measure is a real valued, non negative, 
finite (and therefore bounded), countably additive function on the measurable 
sets of a measurable space. 6 An integral whose domain of integration is not 
indicated is always to be extended over the whole space. If the symbol 
[m], pronounced “modulo n”, follows an assertion concerning the points x of 
X, it is to be understood that the set E of those points for which the asser¬ 
tion is not true is such that E t S and y(E) = 0. Thus, for instance, if / 
and g are functions (with arbitrary range) on X, then f = g [y] means that 

‘The symbol (— : —| stands for the set of all those objects named before the oolon 
which satisfy the condition stated after it. 

* The notation / («) T~ l ( T ) means of course that f is a measurable function not only on the 
measurable Bpaoe ( X, S) but also on the measurable space (X, T~ l (T)). The restriction to 
real valued functions is inessential and is made only in order to avoid the introduction 
of more notation. 

* Although most of the measures occurring in the applications of our theory are ■probability 
measures (i.e. measures whose value for the whole space is 1), the consideration of probabil¬ 
ity measures only is, in many of the proofs in the sequel, both unnecessary and insufficient. 



228 


PAUL R. HALM03 AND L. J. SAVAGE 


n({x: f(x) ^ g(x)\) = 0. Similarly, if / is a real valued function on X, then 
f (e) T~\T) [m] means that there exists a real valued function g on X such 
that g (e) T~ 1 {T) and/ = g [g]. 

If y and v are two measures on S, v is absolutely continuous with respect to a, 
in symbols v « y, if v(E) = 0 for every measurable set E for which y(E) = 0. 
The measures y and v are equivalent, in symbols y s v, if simultaneously a « v 
and v « g. 7 One of the most useful results concerning absolute continuity is the 
Radon-Nikodym theorem, which may be stated as follows.® 

A necessary and sufficient condition that v <5C y is that there exist a non negative 
function f on X such that 

v{E) = f f(x) dy(x) 

Jb 

for every E in S. The function f is unique in the sense that if also 

v(E) = f g(x) dy(x) 

Je 

for every E in S, then f = g [g]. If v{JE) § y(E) for every E in S, then 0 S f(x) 
£ 1 [gi¬ 
lt is customary and suggestive to write / = dv/dy. Since dv/dy is determined 
only to within a set for which g vanishes, it follows that in a relation of the form 

~ (e) T~\T) [g] 

the symbol [y] is superfluous and may be omitted. 

For typographical and heuristic reasons it is convenient sometimes to write the 
relation / = dv/dy in the form dv = fdy\ all the properties of Radon-Nikodym 
derivatives which are suggested by the well known differential formalism cor¬ 
respond to true theorems. Some of the ones that we shall make use of are 
trivial (e.g. dvi = fdy and dv 2 = f 2 dy imply d{y 1 + v 2 ) = (f t + f 2 )dy), while 
others are well known facts in integration theory (e.g. (i) dk — fdv and dv = gdy 
imply dk = fgdy, and (li) dv = fdy and dy = gdv imply fg ~ 1 [y]). 

We conclude this section with a simple but useful result concerning the 
transformations of integrals. 

Lemma 3, If g is a real valued function on Y and y is a measure on S, then 
f g(y)dyT'\y)= f gT(x) dy(x) 

Jr Jr (r) 

for every F in T, in the sense that if either integral exists, then so does the other and 
the two are equal. 


1 It is clear that the relation of equivalence is reflexive, symmetric, and transitive, 
and hence deserves its name. 

8 For a proof of the Radon-Nikodym theorem and simil&T faots concerning the measure 
and integration theory which we employ, see S. Saks, Theory of the Integral, Warszawa— 
Lw<5w, 1937. 



SUFFICIENT STATISTICS 


229 


Proof. Replacing g by gxt we see that it is sufficient to consider the case 
F = Y. 'Hie proof for this case follows from the observation that every ap¬ 
proximating sum 

2 . g(v,bT-\F,) 

of JgdfiT 1 is also an approximating sum 

gT(xME.) 


of J gTdfj, and conversely,® 

3. Conditional probabilities and expectations. Lemma 4. If y and v are 
measures on S such that r <3 C y, then vT~ l « yT~ l . 

Proof. If F e T and 0 - yT~\F) - y(T~ l (F)) 1 then 

0 « y(T-\F)) - pT~\F). w 

Lemma 4 is the basis of the definition of a concept of great importance in 
probability theory. If y is a measure on S and / is a non negative integrable 
function on X, then the measure v defined by dv = fdy is absolutely continuous 
with respect to g. It follows from Lemma 4 that vT~ l is absolutely continuous 
■with respect to yT 1 ; we write dvT~ l = gdyT~ l . Tho function value g{y) is 
known as the conditional expectation of / given y (or given that T(x) ~ y ), we 
shall denote it by <>(/1 y). If / = xs is the characteristic function of a set I? in 
S, then e„C f I V) is known as the conditional probability of E given y; we shall 
denote it by p^E \ y). n 

The abstract nature of these definitions makes an intuitive justification of 

them desirable. Observe that since vT~' l {F) - v(T~\F )) = f(x) dy(x), 

Jt V) 

the defining equation of e„(/ | y), written out in full detail, takes the form 

= f e ll (j\y)dyT\y), FeT. 

•>r (r) Jr 


* It ia of interest to observe that either side of the equation in Lemma 3 may be obtained 
from the other by the formal substitution y - Tlx). A special case of this lemma is the 
celebrated and often misunderstood assertion that tho expectation of a random variable is 
equal to the Erst moment of its distribution funotion, 

10 That tho converse of Lemma 4 is not true is shown by the following example. Let X be 
the unit square, let Y bo the unit interval, and let T be the perpendicular projection from 
X onto Y. Let a be ordinary (Borcl-Lobesgue) measure and let v be linear measure on the 
intersection of X with, say, tho horizontal line whose ordinate is Clearly y is not abso¬ 
lutely continuous with respect to y, but vT~ l =* yT~ l . 

11 Definitions in this form were first proposed by A. Kolmogoroff, Grundbegriffe der 
Wahracheinlichkeitsrechnung, Berlin, 1033. With a slight amount of additional trouble, 
conditional expectation could be defined for more general functions, but only the non 
negative case will occur in our applications. 



230 


PAUL E. HALMOS AND L. J. SAVAGE 


If / = Xi j then this equation becomes the defining equation of p„(E | y ); 

y(E n T-\F)) = £ p,(B 1 1 /) F e T 

The customary definition of “the conditional probability of I? given that 2 7 (:e) e F” 
is y{E n 2 t “ 1 (F))/m(T _ 1 (F)), (assuming that the denominator does not vanish). 
Since y(T~ l (P')) = yT~ l (F), we have 

It is now formally plausible that if "F shrinks to a point y," then the left side 
of the last written equation should tend to the conditional probability of E 
given y and the right side should tend to the integrand 7 v(E | y). The use of 
the Radon-Nikodym differentiation theorem is a rigorous substitute for this 
rather shaky difference quotient approach. 

Since p p (E | y) is determined, for each E, only to within a Bet for which pT -1 
vanishes, it would be too optimistic to expect that, for each y, it behaves, regarded 
as a function of E, like a measure. It is, however, easy to prove that 
(i) P ,(X | y) = 1 [nT-\ 

(u) 0 5 p,(E | y) < 1 WT\ 

(iii) if (E n ! is a disjoint sequence of measurable sets, then E n \y) => 

The exceptional sets of measure zero depend in general on E in (ii) and on the 
particular sequence \E n ) in (iii). It is interesting to observe that, despite the 
fact that y need not be a probability measure, p» turns out always to have the 
normalization property (i) . It is natural to ask whether or not the indeterminacy 
of | y) may be resolved, for each E, in such a way that the resulting function 
is a measure for each y , except possibly for a fixed set of y’s on which yT~ l 
vanishes. Doob 14 has shown that this is the case when X is the real line; in the 
general case such a resolution is impossible. Fortunately, however, conditional 
probabilities are sufficiently tractable for most practical and theoretical purposes, 
and the requirement that they should behave like probability measures in the 
strict sense described above is almost never needed. 

13 We observe that it is not sufficient to require this for F *=» Y only, i.e. to require 
y{F) = $ | y ) dyT~ l (y), This special equation is satisfied by many functions whioh 

do not deserve the name conditional probability; e,g. it is satisfied by vJE I !/) » 
constant = y{E)/yT~ l (Y), 

>* See J. L. Doob, “Stochastic processes with an integral-valued parameter,'’ Am, Math. 
Soc. Trans., Vol. 44 (1938), pp, 96-98. 

See Doob, lac. cit. Doob asserts the theorem in much greater generality, but his 
proof is incorrect The error in the proof and a counterexample to the general theorem 
were communicated to us by J DieudcnnS in a letter dated September 4, 1947. Doob’s 
proof is valid for more general spaces than the real line (e.g. for finite dimensional Euclidean 
spaces and for compact metric spaces). The details of Dieudonnd’s counterexample will 
appear in a forthcoming book (entitled Measure theory) by Halmos. 



SUFFICIENT STATISTICS 


231 


We conclude this section with two easy but useful results which might also 
serve as illustrations of the method of finding conditional probabilities and 
expectations in certain special cases. 

Lemma 5, If y is a measure on S, if g is a non negative function on Y, integrable 
with respect to yT~\ and if v is the measure on S defined by dv = gTdy, then 
dvT~' - gdyT\ or, equivalently, c,{gT | y) = g(y) \yT~\ 

Proof. From v(E) « / gT(x) dy(x) and Lemma 3 it follows that 

»T~ l (F) - v(T-\F)) - J^ g(y) dyT~ l (y). 

Lemma 6 . If y is a measure on S, iff and g are non negative functions on X and 
Y respectively, and if f, gT , andf-gT are all integraMe with respect to p, then 

e,(f-gT\y) - e,(f\y)-g(y) [plT 1 ]. 

Hence, in parlicidar, if F « T, then 

p,(E n T~\F) | y) « p,(E \ y) X r{v) lyT 1 ] 

for every E in S. 

Proof. If dv *■» fdy, then, by definition of e„, vT~\F) ~ J e h (f | y) duT~\y). 
Applications of Lemmas 3 and 5 yield 

f e,(f\y)g(y)dpT\y) - / g(y) dvT~ l (y) - / gT(x) dv(x) 

Jr Jr Jr-i(r) 

- [ , f(x)gT(x) dy(x) - f er(f-gT\y)dyT l (y), 
Jr-ip) Jr 

and therefore the desired conclusion follows from the uniqueness assertion of the 
Radon-Nikodym theorem. 

4. Dominated sets of measures. In many statistical situations it is necessary 
to consider simultaneously several measures on the same cr-algebra. The 
concept of absolute continuity is easily extended to sets of measures. If $2 
and 91 are two sets of measures on S and if, for every set Em S, the vanishing of 
p(jE) for every p in 912 implies the vanishing of v(E) for every v in 92, then we 
shall call 92 absolutely continuous with respect to 932 and write 91 « 9J2. If 
92« 992 and 992« 92, the sets 932 and 92 are oalled equivalent and we write 932 m 92. 
If, in particular, 932 oontains exactly one measure p, 992 *» {p}, the abbreviated 
notations 92 <SC p, p « 92, and p « 92, will be employed for 92 992, 992 « 92, and 

992 m 91, respectively, 

A set 932 of measures on S will be called dominated if there exists a measure X 
on S (not necessarily in 932) such that 992 « X. In applications there frequently 
occur sets of measures which are dominated in a sense apparently weaker than 
the one just defined—weaker in that the measure X, which may for instance be 



232 


PAUL R. HALMOS AND L. J. SAVAGE 


Lebesgue measure on the Borel sets of a finite dimensional Euclidean space, 
is not necessarily finite. It is easy to see, however, that whenever X has the 
property (possessed by Lebesgue measure) that the apace X is the union of 
countably many sets of finite measure, then a finite measure equivalent to X 
exists and the two possible definitions of domination coincide. 

The following result on dominated sets of measures may be found to have 
some interest of its own and will be applied in the sequel. 

Lemma 7. Every dominated set of measures has an equivalent countable subset. 

Proof. Let 9K be a dominated set of measures on S, 2Tf « X; for any n in 2ft 
write/* = dn/dX and /£* = {*:/„(a:) >0). We define (for the purposes of this 
proof only) a kernel as a set K in tS such that, for some measure n in 372, K CZ K„ 
and y(K) > 0 ; we define a chain- as a disjoint union of kernels. Since \(K) > 0 
for every kernel K, it follows from the finiteness of X that every chain is a countable 
disjoint union of kernels. It follows also from these definitions that if C is a 
measurable subset of a chain, such that u(C) > 0 for at least one measure u in 9 ft, 
then C is a chain, and that a disjoint union of chains is a chain. The last two 
remarks imply, through the usual process of disjointing any countable union, 
that a countable (but not necessarily disjoint) union of chains is a chain. 

Let {(?,) be a sequence of chains such that, as j —► oo, \(C } ) approaches 
the supremum of the values of X on chains. If C = U"L l C/, then C is a chain 
for which X(C) is maximal. The definition of a chain yields the existence of a 
sequence [K { ) of kernels such that C = ULi K { , and the definition of a kernel 
yields the existence, for each i ~ I, 2, • • • , of a measure m in 2 ft such that 
K, C K and m(K t ) > 0. We write , pa, • • •}; since ffi C 2ft, the 

relation 9H « 3ft is trivial. We shall prove that M «T. 

Suppose that E eS, m,(B) = 0 for i = 1,2, ■ ■ • , and let n be any measure in 
5DL It is to be proved that u(E) = 0. Since u(E — Kf) = 0 , there is no loss of 
generality in assuming that E C K„ . If y(E - C) > 0, then X(E - C) > 0 
and therefore (since E - C is a kernel) E u C is a chain with X(£/ u C) > X((7). 
Since this is impossible, it follows that u(E - C) = 0. Since 0 = m(E) = 

M.-C® n K { ) = / /*,. d\ and since K x C K H , it follows that \(E n K<) = 0. 

We conclude that \(E n C) ~ 2"-i X(E n Kf) = 0 and therefore n(E n C) — 0. 
Since y(E) = y(E - C) + y(E n C), the proof of the lemma is complete. 


6. Sufficient statistics for dominated sets. The statistic T is sufficient for a 
set HR of measures on S if, for every E in 5, there exists a measurable function 
V = V{E\ y ) on Y, such that 


Pe(E | y) = p(E | y) [yT" 1 } 

for every yml“ In other words, T is sufficient for iffi if there exists a condi- 

11 The original definition 0 f sufficiency was given by R. A. Fisher, "On the mathematical 
pp Un 309-368 h60retlCal Sfcatl3tics ’’ R °y- S ° c - PbU. Trans., Series A, Vol. 222 <1922), 



SUFFICIENT STATISTICS OQO 

tional probability function common to evcrv a in TO nr i , 

conditional distribution induced by T is independent of Speaklng > if the 

an - * ««* ** dn/dk W r^ a measure x 071 s ^ ** 

to S! (Lemma 7), and ante X tor the moaouL 

ME) = 27.! atn,{E) l 

rvhere a, - 1/2V,(.Y), i - 1, 2, ... . Clearly SOI . X 

evfry7a a r ‘^ M ° n 40 tor 

\(E n P” ! (P)) = a,Mt(P n r _1 (P)) 

- rr-1 a, f' P (E\ v ) dy,T\y) . jf P (g | „) JXT-'W, 

i.e, p serves also as a conditional probability for X 
Take any fixed a in S», write */dX - /, and ^ |„) . < ^ , , _ 

gd\T , and wo have, for every E in S, U 9U ' ™ ~ 

/ E Kt) dX(i) - p(E) - J p(E | y) d/iP- 1 ^) 

* / ^ I d\T~ 1 (y) = y c?l ( Xjr | y)e x (yr | y ) d\T~ l {y) 

" / ey(xrgT I y) d\T~ l (y) = J dx(s) = £ gjp^) 

The desired result, /(*) - yT(x) [X], Mows from a comparison of the first and 
last terms in the last written chain of equations 

Wtil ^ We 8ha11 P™e that py is a conditional probability 

^ /V r sz rJ; ,%s 

3 ^- ^LLn^ ThS — * - A imp,let that 

dvT~ l = p^ • gik J F~ l . 

On the other hand dv - x# d M » Xtrg Tdh t so that 

dvP- 1 = eyd\T~ l , 

where * = ey( Xs -gT \ y) = p x (P | y) ff (y). i t follows from a comparison of 
the two expressions for dvT~ l that P 01 

p»( e I y)g(y) = py(E i y ) g { y ) [xr -1 ]. 



234 


PAUL R. HALMOS AND L. J. SAVAGE 


Since the relation dyT 1 = gdkT 1 clearly implies that g(y) ^ 0 \yT' 1 } (i.c. 
that yT~\{y\ g(y ) = 0[) = 0), it follows, finally that 

Pii(E | y) = I y) 

6 . Special criteria for sufficiency. Theorem 1 may be recast in a form more 
akin in spirit to previous investigations of the concept of sufficiency. 10 

Corollary 1. A necessary and sufficient condition that the statistic T he 
sufficient for a dominated set 2K (« X 0 ) of measures on S is that, for every y in HR, 
/„ = dy/dX o he factorable in the form /„ = g^t, where 0 £ g„ (e) T 1 (T), 0 ^ t, t 
and • t are integrable with respect to Xo, and t vanishes [Xo] on each set in S for 
which every yin HR vanishes. 

In more customary statistical language the condition asserts essentially that 
“each density is factorable into a function of the statistic alone and a function 
independent of the parameter.” 

Proof. If T is sufficient for 9J2, then there exists a measure X with the 
properties described in Theorem 1. It follows that 

, _ dy _ dy dX 
11 dXo dX dXo 

and we may write = dy/dX and t = dX/dXo • The only assertion that is not 
immediately obvious is the one concerning the vanishing of l. To prove it, 
suppose that y(E) = 0 for every y in HR 1 , the fact that then 

0 = \(E) = [ t{x) dX o(x) 
implies the desired conclusion. 

If, conversely, f h = g^i, then we may write dX = td\ a . The relation HR ^ X 
follows from the statement concerning the vanishing of t, and the relation 
dy/dX (t) T~ l {T) is implied by the equation dy = g^tdX o = g^dX. 

For the statement of the next consequence of Theorem 1 it is convenient to 
call a set HR of measures on S homogeneous if y s v for every y and v in 2)1. 

Corollary 2. A necessary and sufficient condition that the statistic T be 
sufficient for a homogeneous set 211 of measures on S is that, for every y and v in HR, 
dv/dy (,) T~\T). 

Proof. Since a homogeneous set is dominated (by any one of its elements), 
Theorem 1 is applicable. If T is sufficient for HR and if X has the properties 
described in Theorem 1, then dv/dy = ( dv/d\)/(dy/dX ). The converse follows, 
through Theorem 1, by letting X be any measure in HR. 

We shall say that the statistic T is pairwise sufficient for a set 2)2 of measures 

15 See J. Neyman, “Su un teoreraa ooneernente le oosiddette statistiche suffioienti,” Inst. 
Hal. Ath. Giorn., Vol. 6 (1935), pp. 320-334. In this paper Neyman is somewhat restricted 
by his use of classical analytical methods, but he points out the possibility and desirability 
of extending his results to a much more general domain. For a recent presentation of the 
theory and further references to the literature of. H Cramdr, Mathematical Methods of 
Statistics, Princeton, 1946 



SUFFICIENT STATISTICS 


235 


on S if it is sufficient for every pair {j*, v| of measures in SOT. In other words, 
T is pairwise sufficient for m if, for every B in S and p and v in S3?, there exists a 
measurable function p„ r (E j y) on Y such that 

V,(E I V ) vAE I y) [pT' 1 ] and p,(E | y) = p^E | y) [j/T 1-1 ]. 

Since pairwise sufficiency is (at least apparently) weaker than sufficiency, it is 
not surprising that there is a simple criterion for it even in the case of quite 
arbitrary (not necessarily homogeneous or dominated) sets of measures. 

CoROUiARV 3. A necessary and sufficient condition that T he pairwise sufficient 
for a set 9Ji of measures on S is that, for any two measures p and v m SOT, dp/d{p + r) 
(0 T \T). 

Proof. If T is sufficient for p and v, then there exists a measure A =s p + v 
such that dp/d\ (<) T' X (T) and dv/dk (<) T~ l (T), It follows that 

_ $ £ ~ dy / dip + v) _ dp. // dp , dv\ 
dip 4-v) dX / dX d\/ \dx ' dX/' 

The sufficiency of the condition follows immediately by applying Theorem 1 
to the two-element, set {/*, sj, 

7. Pairwise sufficiency and likelihood ratioB. It is sometimes convenient to 
express the result of Corollary 3 in slightly different language. If X is a measure 
on S and if / and g are real valued measurable functions on X such that 
X((x: f(x) — gix) « 0)) » 0, we shall say that the pair if, g) is admissible [X], 
(Intuitively an admissible pair (J, g) is to be thought of as a ratio f/g, which, 
however, may not bo formed directly at the points x for which g{x) = 0.) Two 
admissible pairs (Ji , gi) and (/a, gi) will bo called equivalent [X], in symbols 
ifi , gi) 82 ifi > (h) (X], if there exists a real valued measurable function t on X 
such that l(x) ^ 0 [X] and such that f\ = ifi and Q\ — igi [X]. It is clear that the 
relation “ m [xl” is indeed an equivalence; the equivalence class containing the 
admissible pair if, g) will be called the ratio of / and g and will be denoted by 
/1 g. (A ratio may accordingly be described as a measurable function from X 
to the real projective line.) For a ratio /1 g we shall write /1 g (e) T~ X {T) [X] 
if the equivalence class f | g contains a pair (/o, go) which is admissible [X] and 
for which/„ (*) T~\T) and g 0 («) T~ l {T), 

Lemma 8 , If p, v, Xi, and Xs are measures on S such that p + v « Xi and 
p + v « X 5 , then the pairs (dp/dX , dv/dX) and (dp/dX, dv/dX) are admissible 
[p + v] and equivalent (n ■+■ v]. 

Proof. The admissibility of, for instance, (dp/dX , dv/dX) follows from the 
fact that dp/dX s>* 0 [a] and dv/dX 0 [v], whence 

+ '’({•■» w -k w -•})-"■ 

To prove equivalence, we write X + X = X. Since 

dp dX _ dp _ dp dXa dv dX _ dv _ dv dko 

dX dX dX dXs dX' dXi dX dX dX a dX ’ 



236 


PAUL R. HALMOS AND L. J. SAVAGE 


since also dXi/dX 0 [Xj and therefore d\i/dX + 0 [g + p], and since, similarly, 
dXi/dX 0 [y + v], the conditions of the definition of equivalence are satisfied 
by l = (dX 2 /dX)/(dXi/dX) 

If y and v are any two measures on S and if X is any measure on 5 such that- 
M+ v « X (for instance if X = y + r), then the ratio dy/d\ [ dv/d\, which 
according to Lemma 8 exists [y + v\ and is independent of X, will be called the 
likelihood ratio of y and v and will be denoted by dy | dv. The result of Corollary 
3 may be expressed in terms of likelihood ratios as follows, 

Theorem 2. A necessary and sufficient condition that T ho •pairwise sufficient 
for a set 501 of measures on S is that, for any two measures y and v in 2)1, 
dp | dv (e) T-\T). 

Proof. If T is sufficient for y and v, then, by Corollary 3, dy/d(y + i>) (e) 
T\T), dv/d(y + v) ( e) !T _ 1 (7), and, by Lemma 8 , (dy/d(y + v), de/d(y + v)) is 
an admissible pair belongmg to the equivalence class dy \ dv. Suppose conversely 
that f = dy/d(y + v), g = dv/d(y + v), and let the real valued measurable 
functions t, fo, and go be such that t ^ 0 [y + v], fo (e) T~ a (T), fj 0 (e) T~ x (T) f 
(fo, go) is admissible [a + v], and 

/ = l-fo, g = t‘Qo [m + v]. 

Since / and g are non negative, it follows that / = | t | • | /o | and g = 1 1 \ ■ | g 0 1 
[m + v], i.e. that there is no loss of generality in assuming that t, f a , and g Q are 
non negative. The relation f + g = l[y+p] implies that t- (Jo + go) ~ 1 
[m + v]; the fact that (fo, go) is admissible [y + v] then yields t e T~ l (T). The 
proof is completed by comparing this result with the expressions for / and g in 
terms of fo and g 0 and applying Corollary 3. 

8 . Pairwise sufficiency versus sufficiency. In order to show that our results 
on pairwise sufficiency (in the preceding section and in the sequel) are not 
vacuous, we proceed now to exhibit a statistic which is, for a suitable set of 
measures, pairwise sufficient but not sufficient. 

Let X = {(a;, i): 0 ^ x ^ 1, i = 0, 1} be the union of two unit intervals and 
let Y = {y: 0 ^ y g 1) be a unit interval. In accordance with our basic 
convention, measurability in both X and Y is to be taken in the sense of Borel. 
The statistic T is defined by T(x, i) = x. 

Write = {(x, 0): 0 g x g 1) and X 2 = {(x, 1): 0|igl). Let y be 
(linear) Lebesgue measure on the class S of Borel subsets of X, and define, 
whenever B e S and 0 i « | 1 , 

y a (E) = %[y(E n X 0 ) + 1)]. 

Let v be (linear) Lebesgue measure on the class T of Borel subsets of Y, and 
define, whenever FeT and 0 S a S 1, 

Va(P) = MK-^) + X*(a)]. 

Clearly Va = y a T~ l \ we write 2)1 = 1 j. 



SUFFICIENT STATISTICS 


237 


If B(y, a) is defined to lie 1 or 0 according as u = a or y ?£ a if 5'(v ad = 
1 - S(y, a), and if 

Va{E | y) =* fi'( y, a)x*{y, 0) + S(y, a) Xs {y, 1), 
then a straightforward computation shows that 

H a (R n T '(F)) = f p.(E | y) dv„(y), 

J p 

so that p a (E | y) = p^E j y ) K]. 

It is now easy to verify that T is pairwise sufficient for SOh Indeed if a and 0 
are any two different numbers in the closed unit interval, we may write 

V(E | y ) = S'(y, a)S\y, /3)x«(y, 0) + [<5(y, a) + 5(y, p)]x*(y, 1). 

Since \y. p(E | y) ^ p„CE | y)} « jj8) and [y. p(E \ y) ^ p ? (E | y)} = {«}, it 
follows that p{E | y) => p a {E | y) [v„] and p(E | y) = p fi (E | y) [»„). 

To prove that T is not sufficient for we observe that p a (Xi | y) = 
fi(j/, a)xx,(v, 1) = s(y, a ) and therefore 

Pa, (Xi I V) “ 5(y, a) [v«]. 

Suppose that there ia a conditional probability function p such that p(E | y) — 
p? a (E | y) [>«]. Tlien, in particular, 

P(.Xi | V ) *= S(y, a) [r.]. 

Since r«({a)) =■ £ > 0, it follows that 

p(Xi | a) = 5 (a, a) => I, 

or, changing to a more suggestive notation, that p(Xi | y) = 1 for all y. We 
have, however, 

•'-({y: P«(.Xi lp) = 0}) = y a ({y: S(y, a) = Oj) 

,, = T„((y: y ^ a}) = J, 

so that v a (\y. p^Xi | y) =* Oj) = £. This contradiction shows the impossibility 
of the existence of a conditional probability function common to every y in 2)1. 

This example shows also that, in a sense, sufficiency is more fundamental 
than pairwise sufficiency. If, for instance, we imagine that it is important to a 
statistician that he either estimate a. Bharpiy or refrain from estimating it 
altogether, then ho is by no means as well off with the observation of y as with 
that of x. 

9. Pairwise sufficiency for dominated sets. We now proceed to show that 
for dominated sets of measures no such example as the one in the preceding 
section exists, or, in other words, that for dominated sets the concepts of pairwise 
sufficiency and sufficiency do coincide. 



238 


PAUL E. HALMOS AND L. J. SAVAGE 


Lemma 9. 
on S, then v 


If T is pairwise sufficient for a set {mo , Mi, M2} of three measures 


dp o 

dip 0 + Mi + Ms) 


wr-^n 


Proof. According to Corollary 3, 

* - S7TX) “ d * - 55^5 wr " <n 

Since djuo = f\d(iH> + mi) = /s^(mo + Ms), we have fidp* “ /iM(mo + Ms) and 
ftdiM = fifid(po + mi), so that 

Cfi + /s ~ Mi)dpo = fifid(po + Mi + Ms)> 

If we write dpa = fd(p 0 +- mi + M 2 ), then it follows that 

(/1 4- ft. — fifi)f = / 1/2 [mo + Mi 4" M 2 ]* 

Since 0 g fi S 1 and 0 S fi ^ 1, the equation fi + ft — fifi = 0 is equivalent 
to fi = f t = 0. Since uo({x: f t (x) = / a (as) = 0]) = 0, it follows that f may bo 
redefined, if necessary, to be 0 on the set [x:fx(x) = f 2 {x) = 0} without affecting 
the relation dim = fdfa + mi + Ma), since outside this set f = /i/ 2 / (f x + ft — /i/a), 
the proof of the lemma is complete. 

Lemma 10. If T is pairwise sufficient for a finite set |mo , Mi, • • ■ , m*1 of 
measures on S, then cWd(XX- 0 M<) («) T~\T). 

Proof. For h = 1 the conclusion is a restatement of the hypothesis; we 
proceed by induction. Given mo , Mi, • • • , m*+i , we write m= XXi Pi • Then 
dpo/d(po + m) (<) T~ l (T) by the induction hypothesis and dpo/d{po + m*+i) («) 
T -1 !!) by Corollary 3. Lemma 9 may then be applied to [p 0 , m, M*+i} and 
yields the desired conclusion. 

Lemma 11. If (mo , Mi, M 2 • • •} is a sequence of measures on S such that 
Lf-u pfX) < «>, if, for every E in S, p{E) = E^T-c pi(E ); and if X is a measure 
S such that mi « X for i = 0, 1, 2, ■ • • , then 

lua k Mi )/d\ = dp/dX [X]. 

Proof. Since 0 ^ d(X*-oM.)/dX = £v-o (dpi/dX) < dp/dX [X], the se¬ 
ries £”-0 (dpi/dX) does indeed converge to a measurable function / [X], Since, 
for every E in S, 


j F fdX = ZT-o l ^d\ » Eh p ( (E) - p(E), 
we have / - dy./d\ [X], as stated. 


11 In view of Theorem 1, Lemma 9 asserts that if T is pairwise sufficient for a set of 
three elements, then T is sufficient for 9)1. Lemmas 10 and 12 extend thiB result to finite 
and countably infinite sets 2)t respectively. Since every countable set of measures is 
dominated, the final result, Theorem 3, contains all these preliminaries as speoial cases. 



SUFFICIENT STATISTICS 


239 


LEMMA 12. If (^o , Pt, «i • • ■ | is a sequence of measures on S such that 
£7-o m(X) < and, if, for every E in S, p(E) = £7*> p>(E), then 

]im k Vi) ~ dp^/dp [/<]. 

If, in addition, T is 'pairwise sufficient for the sequence (mo , Pt , Mi • • • •) then 
dya/dp («) T~ l (T). 

Proof. We have, for fc * 0, 1, 2, • • • , 

dpo dQX jimO Ml) _ dflQ 

d{Tl k >-0 Mi) dp dp ' 

If we write X => p, then the hypotheses of Lemma 11 are satisfied and, con¬ 
sequently, the second factor on the left side converges to 1 [>]; it follows that the 
first factor converges to dpo/ dp [/*]. The second assertion of the lemma follows 
from Lemma 10. 

Theorem 3. A necessary and sufficient condition that T be sufficient for a 
dominated set 50? of measures on S is that T be pairwise sufficient for 50L 

Proof. The necessity of the condition is obvious. To prove its sufficiency, 
let 5 JI = [pi, Pi, ■ ■ •} be a countable subset of 501 which is equivalent to 501 
(Lemma 7), and let pq be an arbitrary measure in $01. Since the sufficiency or 
pairwise sufficiency of T remains unaltered if some or all of the measures in 2)1 
are replaced by positive constant multiples of themselves, we may assume that 
£7-o Pi(X) <oo. If we write, for every E in S, X(E) *= £7-i Pi(E), then the 
pairwise sufficiency of T and Lemma 12 imply that dpo/dipo + X) (e) T~\T). 
The relation 

dpo dpo dip o -p X) dp$ f dX \ 

dX d[po "h X) dX d(po + X) \d(/Jo -p X)/ 

_ dv* /, _ dp o \~ L 

dipo + X) \ dipo + X)/ 

implies that dpo/dX («) T~ l (T); an application of Theorem 1 concludes the proof. 

A comparison of Theorems 1 and 2 and Corollary 3 yields immediately the 
following consequence of Theorem 3. 

Corollary 4. A necessary and sufficient condition that the statistic T be 
sufficient for a dominated set 201 of measures on S is that, for any two measures 
P and v in 501, dp/d(p + v) («) T~\T), or, equivalently, dp | dv (e) T~\T). 

10. The value of sufficient statistics in statistical methodology. We gather 
from conversations with some able and prominent mathematical statisticians 
that there is doubt and disagreement about just what a sufficient statistic is 
sufficient to do, and in particular about in what sense if any it contains “all the 
information in a sample.” We therefore conclude this paper with a brief 
explanation of a point of view which, while not original with us, has not received 
due publicity. 



240 


PAUL R. HALMOS AND h- J. SAVAGE 


Suppose a statistician § is to be shown an observation x drawn at random 
from some sample space (X , S) on which an unknown measure, m, of a set 3 71 of 
possible measures obtains, while for the same observation x another statistician 
0" is only to be shown the value T(x) of some statistic T sufficient for Sffi. It is 
clear that § is as well off as ST; we shall argue that IX is also as well off as S. 

Suppose § has decided how to use hiB datum, that, in other words, he has 
decided just what he will do (or, in particular, say) in the event of each possible x. 
His program can then be described schematically by saying that he has selected 
some function f (of the points x) which, without serious loss of generality, may 
be supposed to take real values. Now S’s only real concern is for the probability 
distribution of f given ju, he. for the function p of a real variable c, defined by 

<p{c) = n({x:J(x) < c)) = ti(E{c)). 

But IT can if he wishes achieve exactly the same results as S, in the following way. 
Let him, on learning the value of T(x), select a real number /, with the aid of a 
“random machine” which produces numerical values according to the known 
distribution function *p, defined by 

m = v(E(c) I T(x)). 

Then, for any ^ in 93?, the probability that 7 will select a value less than c is 

/ P$(c) | y)dfiT' l (y) = M (jg?( c )) = v>(c). 

Thus 7 is at no disadvantage, save for the mechanical one of having to manipu¬ 
late a random machine, and he may fairly be said to have as much information 
as S. 

As a matter of fact we know of no practical situation in which 7 would actually 
go to the trouble of using a random machine. There are some situations in 
which he should in principle do so, but in which practical statisticians have not, 
so far as we know, thought it worth while. If, for example, an outcome consists 
of a sequence of n heads and tails resulting from n spins of a coin the heads 
ratio of which is known to be either one half or one quarter, then a sufficient 
statistic is the number of heads which occur in the sequence. In basing a 
decision on the outcome of this program both S and, to a still greater extent, 
7 have (according to Wald’s theory of minimum risk) something to gain by 
recouise to a random machine. There are, on the other hand, many technical 
desiderata which sufficient statistics meet exactly without recourse to random 
machines. Thus, as Blackwell has shown, 18 if S has an unbiased estimate, 12, 
of some parameter, 7 can find a function 12*, defined by R*(y) = e(R \ y), 
which is an unbiased estimate of that parameter, with variance not greater than 
t at of 12. More generally, if R is any estimate with finite mean square deviation 
from a parameter, then it is easy to show with Blackwell's methods that R* 


. T 0 °?, d n\ t i? nal expectation and unbiased sequential estimation/’ Annals 

of Math Slat,, Vol. 18 (1947), pp 105-110, 



SUFFICIENT STATISTICS 


241 


has no larger a mean square deviation than R. Finally it is a well known fact 
that, under suitable hypotheses, if there exists a maximum likelihood estimate R 
of some parameter, then R depends only on y. 

We think that confusion has from time to time been thrown on the subject 
by (a) the unfortunate use of the term "sufficient estimate,” (b) the undue 
emphasis on the factorability of sufficient statistics, and (c) the assumption 
that a sufficient statistic contains all the information in only the technical 
sense of “information” as measured by variance. 



ON DESIGNING SINGLE SAMPLING INSPECTION PLANS 

By Frank E. Grubbs 

Ballistic Research Laboratories , Aberdeen Proving Ground, Mi. 

1. Summary. In designing single sampling inspection plans, a problem is to 
find the acceptance number, c, and the smallest sample size, n, Buch that if the 
fraction defective of the material inspected is equal to an acceptable value, pi , a 
large percentage, say, 95% of such lots will be accepted under the sample criteria, 
whereas if the fraction defective of the material inspected is objectionable and 
equal to pi (where pi < p 2 ), then a large percentage, say, 90% of such lots will 
be rejected. A solution to this problem for the caBe where the lot size is large 
compared to the sample si 2 e is given in this paper and tables are provided for 
quick determination of the sample size n and acceptance number c. 

2. Introduction. In sampling inspection of material one practice is to Bet an 
acceptable quality level = pi, say, such that the consumer desireB to accept 
practically all—95% or more—of lots of fraction defective pj or less (and hence 
desires to reject at most a maximum of about 5% of lots which are of quality 
pi or better) and to set also an objectionable fraction defective “pi, say, which 
represents quality so poor that the consumer cannot afford to accept more than 
about 10% or less of lots of this quality or poorer, 1 From the standpoint of the 
producer, he should have very few rejections, 5% or less, for his submitted lots 
the fractions defective of which are equal to or better (less) than pi, whereas 
he should be willing and also expect to suffer increasingly more rejections if his 
process average percent defective departs from the acceptable quality level p, 
toward poor or objectionable quality. In this connection, if we are given pi an 
acceptable quality level, pi an objectionable percent defective, the risk « = 5% 
of rejecting a lot of fraction defective pi, and the risk fi = 10% of accepting a 
lot of the objectionable fraction defective pj, a problem of importance in single 
sampling inspection is to find the smallest sample size n and the acceptance 
number c which will approximate closely the protection stated above. Due to 
the discrete nature of ii and c, it is not usually possible to find n and c such that 
precisely the above protection is guaranteed; however, it is possible to pick that 
single sampling plan which, for all practical purposes, gives the desired protection, 
i.e. it is possible to select that single sampling plan which more neatly satisfies 

1 When this paper was firat presented for publication, the percent defectives pi and pt 
were labeled ‘Acceptable Quality level 11 and "Lot Tolerance Percent Defective,” respec¬ 
tively. In view of the suggestions of H. G, Komig and H. F. Dodge, strict reference to 
these particular terms have been avoided in order that the percent defectives pi and p» 
would appear in a more generalized form, This recommendation is considered especially 
esirable in view of the faot that Table I and Table II of the paper are percentage points 
of the Bmomial Distribution and hence are useful in problems other than that of designing 
single sampling inspection plans. 


242 



DESIGNING SAMPLING PLANS 


243 


the above protection requirements than any other plan. The values of n and c 
can be found simply by looking for an entry in Table I below which is close to 
pi and an entry in Table II close to p 2 such that column heading c and row 
heading n in Table I correspond exactly with the respective column and row 
headings in Table II. For the sample sizes n, acceptance numbers c and quality 
levels p covered in Tables I and II, the above procedure makes unnecessary any 
computation of or any approximation to the sample size and acceptance number. 
It will bo noticed, however, that usually the proper choice of c is clear whereas 
some slight judgment may be necessary in selecting n. 

It is remarked also that Tables I and II solve the equivalent problem of 
finding n and c in connection with testing the hypothesis Ho that the fraction 
defective of the Binomial population sampled is pi or less as against an alternative 
hypothesis Hi which states that the fraction defective of the lot, population, 
process, etc., sampled 1 b p s or greater (pt > pi), where a = .05 is the maximum 
risk of erroneously rejecting H 0 when it is true and /3 = .10 is the maximum risk 
of erroneously accepting Ho when the alternative Hi is true. 

The solution to the problem of finding an appropriate single sampling plan 
in this paper is given by solving the infinite case, i.e. by assuming the lot to be an 
infinite Binomial population. In practice lotB are of finite size. However, 
it is well known that Binomial probabilities (infinite universe) give excellent 
practical approximations to Hypergeometric probabilities (finite lot) provided 
the sample size is only a Bmall percentage of the lot size. Hence, the reader is 
warned in using the tables for sampling inspection problems that the lot size 
should be at least 10 or 15 times the Bample size. 

3. Basis for construction of Table I and Table n. It is well known that if 
P(c, n, p) represents the probability of obtaining c or less defectives in a random 
sample of size n from a Binomial Population of fraction defective p, then the 
relation between P(c, n, p) and the Incomplete Beta Function Ratio is given by 

(1) P(c,n,p) « Jt- P (n -g.o+1) - ^ (n _ l e - + —) l x^il-xYdx. 

Consequently, using a table of percentage points for the Incomplete Beta 
Function (1), values of pi can be found for Table I such that 

P(c, n, pi) = .95, 

and values of pa can be found for Table II presented at the end such that 

P(c, n, p 2 ) = .10. 

Also, Table I and Table II can be computed by using percentage points of the 
F-distribution (2), Upon making the transformation 

2 (n — c) 

2 (n - c) + 2(c + 1)F 


x 



244 


FRANK E. GRUBBS 


in (1) above to the ^-distribution, we obtain easily that 

(2) P(c, n, p) = ^rr-X~ -- f [2(c + l)r hl [2(n - c)r° F° ■ 

pip + 1, n — C) j (n-o)j/(i>fl) 5 

[2(» - c) + 2(c + 1)F]'" _1 dF, 

where q = 1 — p. 

With the aid of a table of percentage points of the /'’-distribution (2), we may 
determine for various combinations of n — c and c + 1 those values of p such that 


P(c, n, pi) = .95 for Table I; 


and 


P(c, n, p 2 ) = .10 for Table II. 

In fact, if P(c, n, p) = a, then 

h^* = P.|2fr + l),2(„-c)), 
or 


= (e+ l)F«{2(c + 1), 2(n - c)) 

P (n - c) + (c + l)P„{2(c + 1), 2 ( n -c)}’ 

for which relation values of p L for a ~ . 95 are given in Table I below and values of 
Pi for a = ,10 are given in Table II. 

Although the 95% points are not given directly in (2), they are easily obtain¬ 
able from the relation 


F.m(v i, Vi) 


_ 1 

F.otO's, vO ‘ 


Interpolation was required for the great majority of the entries in Tables I 
and II. The values given were obtained by harmonic or linear interpolation 
using References [1] and [2] and are believed accurate to within one unit in the 
last place. 

It will be noticed that if the chosen acceptable quality level, p x , is greater 
than the appropriate tabulated value in Table I for the single sampling plan 
(n, c), then the operating characteristic curve will pass below the point (pi, .95). 
That is, the risk of rejection under the sampling plan for lots of fraction defective 
Pi will be somewhat more than 5%. On the other hand, if a selected acceptable 
quality level p x is less than the appropriate entry in Table I, the risk of rejection 
for a product of fraction defective p t will be less than 5%. Similar considerations 
apply also to the fractions defective, p 2 , in Table II. 


satn Pl' n g plans based on the Poisson approximation to the binomial. 

Tables I and II are useful for determination of a single sampling plan when the 



DESIGNING SAMPLING PLANS 


245 


desired percent defectives are listed and n does not exceed 150. Table III 
is particularly useful in designing a single sampling plan when we are interested 
in fractions defective not greater than about .10. A somewhat similar procedure 
has already been suggested by Peach and Littauer [3). If we designate by 
P(c, a) the sum of individual Poisson probabilities, 


Pie, a) 


io'ml ’ 


then Table III gives values ai ~ npi of a for which 

P(c, ai) = .95 

and values a 2 =* npi of a for which 

P(c, at) = .10. 

Hence, to find the single sampling plan whose operating characteristic curve 
passes nearly through the points (pi, .95) and (pa, .10) one merely divides 
values of Oi in Table III for various values of c by the acceptable quality level pi 
and divides values of a* in Table III by the objectionable percent defective jh . 
Then the acceptance number c is picked for which ajpx most nearly equals a«/p 5 
and the approximate sample size n may be determined by rounding to an integer 
the average of the two approximately equal numbers m/pi and oj/p 2 . 


5, Example on the use of Tables I, H, HI. Given an acceptable percent 
defective or quality level of .01 and an objectionable quality level of .10, it is 
desired to find the single sampling plan which will accept 95% of product which is 
of quality pi =* .01 and which will reject 90% (or accept only 10%) of product of 
quality pi = .10. Looking in Table I for entries pi which approximately equal 
.01 and in Table II for entries p 2 which approximately equal .10 such that the 
c and n of Tables I and II correspond, we see that c must be equal to 1 whereas n 
may take possibly any one of the values 35, 36, 37, 38. In this connection, we 
have to set up some criteria for the choice of n. Although any of several criteria 
may be used, a reasonable criterion appears to involve picking n such that the 
sum of the absolute departures of the Operating Characteristic Curve from the 
risks a * ,05 at pi and p = .10 at p 2 is a minimum. This may be determined by 
using appropriate tables of Binomial Probabilities or by computing at pi and p 2 
the chance of obtaining c or loss defectives in n for the various possible combina¬ 
tions of c and n. If the above criterion were applied to the present example, 
the combination c = 1 and n = 37 would bo selected, i.e, the single sampling 
plan would be c = 1, n » 37. For this sampling plan, the probability of passing 
at pi = .01 is .9471 and the probability of passing at ps = .10 is .1036. For 
the sake of expediency, another proposal would be merely to select somewhat of a 
“middle” value of n especially when the variation in sample size is slight. 

If we use Table III for the above example, we can select n and c with the aid 



246 


THANK E. GRUBBS 


of the following simple tabulation: 


n 





0 

i 

2 

3 

ai/pi . 

■E9H 

35.5 

81.8 

mm 

ai/Pi . 

M2m 

38.9 

53.2 

HU 


Since the sample sizes “cross” at c = 1, we would select c = 1 and n = 1/2 
(35.5 + 38.9) = 37.2 or n = 37. 

A use of Table I of some practical importance is in determining at a glance 
those values of p for which the probability of obtaining c or less defectives in a 
sample of n is equal to .95. As a matter of fact, a series of tables similar to 
Table I and Table II for which P(c, n, p) — .99, .95, .90, .10, .05, .01 etc. would 
be of considerable practical use. 

Acknowledgment. The author is indebted to MisB Helen J. Coon for carrying 
out the computations for the tables. 













TABLE I 

Values of p = px such that P(c , n, p t ) = .95 



c 



0 

1 

2 

3 

i 

5 

e 

7 

8 

9 

71 

1 

.0500 










1 

2 

.0253 

.224 









2 

3 

.0170 

.135 

.368 








3 

4 

.0127 

.0970 

.249 

.473 







4 

5 

.0102 

.0764 

.189 

.343 

.549 






6 

6 

.00851 

.0628 

.153 

.271 

.418 

.607 





6 

7 

.00730 

.0534 

.129 

.225 

.341 

.479 

.652 




7 

8 

.00639 

.0464 

.111 

.193 

.289 

.400 

.529 

.688 



8 

9 

.00568 

.0410 

.0978 

.169 

.251 

.345 

.450 

.571 

.717 


9 

10 

.00512 

.0368 

.0873 

.150 

.222 

.304 

.393 

.493 

.606 

.741 

10 

11 

.00465 

tips 

.0788 

.135 

.200 

.271 

.350 

.436 

.530 

.636 

11 

12 

.00427 

m 

.0719 

.123 

.181 

,245 

.315 

.391 

.473 

.562 

12 

13 

.00394 

.0281 

.0660 

.113 

,166 

.224 

.287 

.355 

.427 

.505 

13 

14 

.00366 

.0260 

.0611 

.104 

.153 

.206 

.264 

.325 

.390 

.460 

14 

15 

.00341 

.0242 

.0568 

.0967 

.142 

.191 

.244 

.300 

.360 

.423 

15 

16 

.00320 


.0531 

.0903 

.132 

.178 

.227 

.279 

.333 

.391 

16 

17 

.00301 


.0499 

.0846 

.124 

.166 

.212 

.260 

.311 

.364 

17 

18 

.00285 


.0470 

.0797 

.116 

.156 

.199 

.244 

.291 

.341 

18 

19 

.00270 

.0190 

.0445 

.0753 

.110 

.147 

.188 

.230 

.274 

.320 

19 

20 

.00250 

.0181 

.0422 

.0714 

.104 

.140 

.177 

.217 

.259 

.302 

20 

21 

.00244 

.0172 

.0401 

.0678 


.132 

.168 

.206 

.245 

.286 

21 

22 

.00233 

.0164 

.0382 

.0646 

.0941 

.126 

.160 

.196 

.233 

.271 

22 

23 

.00223 

.0157 

.0365 

.0617 

.0898 

.120 

.152 

.186 

.222 

.258 

23 

24 

.00213 

,0150 

.0350 

.0590 

.0859 

.115 

.146 

.178 

.212 

.246 

24 

25 

.00205 

.0144 

.0335 

.0566 


.110 

.139 

.170 

.202 

.236 

25 

26 

.00197 

.0138 

.0322 

.0543 


.106 

.134 

.163 

.194 

.226 

26 

27 

.00190 

.0133 

.0310 

.0522 

.0759 

.101 

.129 

.157 

.186 

.217 

27 

28 


.0128 

.0298 

.0503 

.0731 

mitt jri 

.124 

.151 

.179 

.208 

28 

29 

.00177 

.0124 

.0288 

.0485 



.119 

.145 

.172 

.200 

29 

30 


.0120 

.0278 

.0469 

.0081 


.115 

.140 

.167 

,193 

30 

31 


.0116 

.0269 

.0453 

.0058 


.111 

.135 

.161 

.187 

31 

32 


.0112 

.0260 

.0438 

.0637 


.107 

.131 

.155 

.180 

32 

33 


.0109 

.0252 

.0425 

.0617 


.104 

.127 

.150 

.175 

33 

34 


.0106 

.0245 

.0412 

.0598 

Rpll 

.101 

.123 

.146 

.169 

34 

35 

.00146 

.0102 

.0238 

.0400 

.0580 


.0978 

.119 

.141 

.164 

35 


247 



























TABLE I —Continued 







C 









i 

■ 1 

3 

4 

5 

G 

7 

8 

9 


36 


iggi 


S| 


.0752 

.0950 

.11(5 

.137 

.159 

30 

37 

iHnUI 


9 


.0548 

.0731 

0923 

.112 

.133 

.155 

37 

38 

.00135 

M 



.0533 

.0711 

,0898 

ITiHl 

. 130 

.150 

38 

39 






.0692 

WM 

.106 

.126 

.146 

39 

40 





m 


1 

.104 


.142 

40 

41 

.00125 




IIS 

.0657 

.0830 

1 


.139 

41 

42 




Sjpyis* 


.0641 



.117 

.135 

42 

43 




M 

15'aSl 

.0626 

■ is 


.114 

.132 

43 

44 

.00117 

.00814 


Ifijll 


.0611 

.0771 

shBI 

.111 

.129 

44 

45 





.0448 

.0597 


lip 


.126 

45 

46 


ill 



||fSS 

.0584 


H| 


.123 

46 

47 






.0571 

.0720 

ioSwsf 

.104 

.120 

47 

48 


mtma 



ipSn 

.0559 


.0857 

mi 

.118 

48 

49 




J 

ISf B 

.0547 



23 

.115 

49 

50 






.0536 

0676 


.0972 

.113 

50 

51 




if 1 


.0525 




.110 

61 

52 




§ : \» 

gt|8 

.0515 

21 

.0789 

pnlfs! 

.108 

52 

53 










.106 

53 

54 






.0495 

ip* 



.104 

54 

55 


1 



E3 

.0486 

E|| 

B 


.102 

55 

56 






.0477 



.0865 

.100 

56 

57 






,0468 

iil 

.0718 

.0849 

.0984 

57 

58 







BBS 

.0705 

.0834 

.0966 

58 

59 


liSlS 




,0452 

.0570 


.0820 

.0949 

59 

60 


.00595 



0334 

.0445 

.0561 


.0806 

.0933 

60 

61 





m 

m 

m 


.0792 

.0917 

01 

62 





B 

K 

rSS 


.0779 

.0902 

62 

63 





RUE 

.0423 

.0533 


.0766 

.0887 

63 

64 





.0313 

mm 

mm 

.0637 

.0764 

.0873 

64 

65 






.0410 


||j 

.0742 

.0859 

05 

66 






wm 

rjjeaja 



.0730 

.0840 

66 

67 


»o|||| 

.0123 


gH 

.0397 

.0501 

Bwr 

.0719 

.0833 

67 

68 



mu 


0294 

B2® 

mm 


.0708 

.0820 

68 

69 


E 

B 


B 

.0385 

.0486 

■Pn^ 1 

.0698 

.0808 

69 

70 



m 


m 

.038C 

.0470 


.0687 

.0796 

70 


248 































































































TABLE I —Continued 







C 








0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

n 

71 


.00503 

,0116 

.0195 

0282 

.0374 

.0472 

.0573 

.0678 

.0785 

71 

72 

w. 

.00496 

.0115 

.0192 

.0278 

.0369 

.0465 

.0565 

.0668 

.0773 

72 

73 



.0113 

.0189 

.0274 

.0364 

.0459 

0557 

.0658 

.0762 

73 

74 


.00482 

.0111 

.0187 

.0270 

.0359 

.0452 

,0549 

.0649 

.0752 

74 

75 

ESI 

HI 

.0110 

.0184 

.0266 

.0354 

.0446 

.0542 

.0641 

.0742 

75 

76 

.000675 

.00470 

.0108 

.0182 

.0263 

.0349 

.0440 

.0535 

.0632 

.0732 

76 

77 

.000666 


.0107 

.0179 

.0259 

.0345 

.0434 

.0528 

.0623 

.0722 

77 

78 

.000657 


.0106 

.0177 

.0256 

.0340 

.0429 

.0521 

.0615 

.0712 

78 

79 

.000649 


.0104 

.0175 

.0253 

.0336 

.0423 

.0514 

.0607 

.0703 

79 

80 

.000641 


.0103 

.0173 

.0249 

.0332 

.0418 

.0507 

.0600 

.0694 

80 

81 

,000033 

.00440 

.0102 

.0170 

.0246 

.0328 

.0413 

.0501 

.0592 

.0685 

81 

82 

.000625 

.00435 

.0100 

.0168 

.0243 

.0323 

.0408 

.0495 

.0585 

,0677 

82 

83 

.000618 

.00430 

.00992 

.0166 

.0240 

.0319 

.0403 

.0489 

.0577 

.0668 

83 

84 

.000010 

.00425 

.00980 

.0164 

.0237 

.0316 

.0398 

0483 

.0570 

.0660 

84 

85 

.000603 

.00420 

.00969 

.0162 

.0235 

.0312 

.0393 

.0477 

.0564 

.0652 

85 

80 

.000590 

.00415 

.00957 

.0160 

.0232 

.0308 

.0388 

.0471 

.0557 

.0645 

86 

87 

.000589 

.00410 

.00946 

.0159 

.0229 

.0305 

.0384 

.0466 

.0550 

.0637 

87 

88 

.000583 

.00405 

.00936 

.0157 

.0227 

.0301 

.0379 

.0460 

.0544 

.0630 

88 

89 

.000576 

.00401 

.00925 

.0155 

.0224 

.0298 

.0375 

0455 

.0538 

.0622 

89 

90 

.000570 

.00390 

.00915 

.0153 

.0221 

.0294 

.0371 

.0450 

.0532 

.06X5 

90 

91 

.000564 

.00392 

.00904 

.0152 

.0219 

.0291 

.0367 

.0445 

0526 

.0608 

91 

92 

.000557 

.00388 

.00895 

.0150 

.0217 

.0288 

.0363 

.0440 

.0520 

.0602 

92 ] 

93 

.000551 

.00383 

.00885 

.0148 

.0214 

.0285 

.0359 

.0435 

.0514 

.0595 

93 1 

94 

.000546 

.00379 

.00875 

.0147 

.0212 

.0282 

.0355 

.0431 

.0509 

0589 

94 

95 

.000540 

.00375 

.00866 

.0145 

.0210 

.0279 

.0351 

.0426 

.0503 

.0582 

95 

96 

.000534 

.00371 

.00857 

.0144 

.0207 

.0276 

.0347 

.0421 

.0498 

.0576 

96 

97 

.000529 

.00368 

.00848 

.0142 

.0206 

.0273 

.0344 

.0417 

.0493 

.0570 

97 

98 

.000523 

.00364 

.00840 

.0141 

.0203 

.0270 

,0340 

.0413 

.0487 

.0664 

98 

99 

mm 

.00360 

.00831 

.0139 

.0201 

.0267 

,0337 

.0408 

.0482 

.0558 

99 

100 

.000513 

.00367 

.00823 

.0138 

.0199 

.0265 

.0333 

.0404 

.0478 

.0553 

100 

101 

.000508 

.00353 

.00814 

.0136 

.0197 

.0262 

.0330 

.0400 

.0473 

.0547 

101 

102 

.000503 

.00350 

.00806 

.0135 

.0195 

.0259 

.0327 

.0396 

.0468 

.0542 

102 

103 

.000498 

.00346 

.00799 

.0134 

.0193 

.0257 

.0323 

.0392 

.0463 

.0536 

103 

104 

.000493 

.00343 

.00791 

.0132 

0191 

.0254 

.0320 

.0389 

.0459 

.0531 

104 

105 

.000488 

.00339 

.00783 

.0131 

.0189 

.0252 

.0317).0385 

.0454 

.0526 

105 


249 








TABLE I —Continued 



c 



0 

l 

2 

3 

4 

6 

6 

7 

8 

9 


106 

.000484 

.00336 

.00776 

.0130 

.0188 

0249 

0314 

M 

.0450 

.0521 

106 

107 

.000479 

,00333 

.00768 

0129 

.0186 

.0247 

.0311 



.0516 

107 

108 

.000475 

00330 

00761 

,0127 

.0184 

,0245 

.0308 


• 0442 

.0511 

108 

109 

.000470 

.00327 

00754 

.0126 

.0182 

,0242 

.0305 


.0438 

.0506 

109 

110 

000466 

.00324 

.00747 

.0125 

.0181 

.0240 

.0302 


.0433 

.0502 

110 

111 

.000462 

.00321 

.00741 

.0124 

.0179 

.0238 

.0300 


.0430 

.0197 

111 

112 

.000458 

.00318 

.00734 

.0123 

.0178 

.0236 

.0297 

.0360 


.0492 

112 

113 

.000454 

.00315 

00727 

.0122 

.0176 

.0234 

.0294 

.0357 

.0422 

.0488 

113 

114 

,000450 

.00313 

.00721 

.0121 

.0174 

.0232 

.0292 

.0354 

.0418 

.0484 

114 

115 

.000446 

.00310 

.00715 

.0120 

.0173 

.0230 

.0289 

,0351 

.0414 

.0479 

115 

116 

.000442 

.00307 

.00709 

.0119 

.0171 

.0228 

.0287 

.0348 

.0411 

.0475 

116 

117 

,000438 

.00305 

.00702 

.0118 

.0170 

.0226 

.0284 

.0345 

.0407 

.0471 

117 

118 

.000435 

.00302 

.00696 

.0117 

.0168 

,0224 

,0282 

.0342 


.0467 

118 

119 

.000431 

.00299 

.00691 

.0116 

.0167 

.0222 

.02 rr 9 

.0339 


.0463 

119 

120 

.000427 

.00297 

.00685 

.0115 

.0166 

.0220 

.0277 

• 0336 

.0397 

.0459 

120 

121 

.000424 

.00294 

.00679 

.0114 

.0164 

.0218 

.0275 

.0333 

.0394 

.0455 

121 

122 

.000420 

.00292 

.00674 

0113 

.0163 

.0216 

.0272 

.0330 


.0451 

122 

123 

.000417 

.00290 

.00668 

.0112 

.0162 

.0215 

.0270 

.0328 

.0387 

.0148 

123 

124 

.000414 

.00287 

00663 

.0111 

.0160 

.0213 

.0268 

.0325 

.0384 

.0444 

124 

125 

.000410 

.00285 

.00657 

.0110 

.0159 

.0211 

0266 

.0322 


.0440 

125 

126 

000407 

.00283 

.00652 

.0109 

.0158 

0209 

.0264 

.0320 

.0378 

• 0437 

126 

127 

.000404 

.00281 

.00647 

.0108 

.0156 

.0208 

.0202 

.0317 

.0375 

.0433 

127 

128 

000401 

.00278 

.00642 

.0107 

.0155 

0206 

,0259 

.0316 

.0372 

0430 

128 

129 

•000398 

.00276 

.00637 

.0107 

.0154 

.0204 

.0257 

.0312 

Wrnm 

.0427 

129 

130 

■000394 

.00274 

.00632 

.0106 

.0153 

.0203 

.0255 

.0310 

IHg 

-0423 

130 

131 

000391 

.00272 

.00627 

.0105 

.0152 

.0201 

.0253 

.0308 

.0363 

.0420 

131 

132 

.000389 

.00270 

.00622 

.0104 

.0150 

.0200 

.0252 

.0305 


.0417 

132 

133 

.000386 

.00268 

.00618 

.0103 

.0149 

.0198 

.0250 

.0303 

.0358 

.0414 

133 

134 

.000383 

.00260 

.00613 

.0103 

.0148 

.0197 

.0248 

.0301 

ns 

.0410 

134 

135 

.000380 

.00264 

.00608 

.0102 

.0147 

.0195 

.0246 

.0298 


.0407 

135 

136 

.000377 

.00262 

.00604 

.0101 

.0146 

,0194 

.0244 

.0296 


.0404 

136 

137 

.000374 

.00260 

.00599 

.0100 

.0145 

.0192 

.0242 

.0294 

.0347 

.0401 

137 

138 

.000372 

.00258 

.00595 

.00996 

.0144 

.0191 

.0240 

.0292 

.0344 

.0398 

138 

139 

.000369 

.00256 

.00591 

.00989 

.0143 

.0190 

.0239 

.0290 

,0342 

•0395 

139 

140 

.000366 

.00254 

.00587 

.00982 

.0142 

.0188 

.0237 

.0288 


.0393 

140 


250 


































TABLE II 


Values of p — ]h such that P(c, n, pa) — 10 



c I 

71 

71 

0 

1 

2 

3 

4 

' 5 

0 

7 

s 

9 

1 

.900 










1 

2 

.684 

.949 









2 

3 

.536 

.804 

.965 








3 

4 

,438 

.680 

.857 

.974 







4 

5 

.369 

.584 

753 

.888 

.979 






5 

6 

.319 

.510 

667 

.799 

.907 

.983 





6 

7 

280 

.453 

.596 

.721 

.830 

.921 

.985 




7 

8 

.250 

.406 

.538 

.655 

.760 

.853 

.931 

987 



8 

9 

226 

.368 

.490 

.599 

.699 

.790 

.871 

.939 

.988 


9 

10 

.206 

337 

.450 

.552 

.646 

.733 

.812 

.884 

.945 

990 

10 

11 

189 

310 

.415 

.511 

.599 

.682 

.759 

.831 

.895 

.951 

11 

12 

175 

.288 

386 

.475 

.559 

.638 

.712 

.781 

.846 

904 

12 

13 

.162 

268 

.360 

.444 

.523 

.598 

.669 

.736 

.799 

.858 

13 

14 

.152 

.251 

.337 

.417 

.492 

.563 

631 

.695 

.757 

.815 

14 

15 

.142 

.236 

317 

.393 

.464 

.532 

.596 

.658 

.718 

.774 

15 

16 

134 

.222 

300 

.371 

.439 

.504 

.565 

.625 

.682 

.737 

16 

17 

.127 

.210 

.284 

.352 

.416 

478 

.537 

.594 

650 

.703 

17 

18 

.120 

.199 

.269 

.334 

396 

.455 

.5.12 

.567 

.620 

.671 

18 

19 

.114 

.190 

257 

.319 

.378 

.434 

.489 

.541 

.592 

.642 

19 

20 

.109 

.181 

.245 

.304 

.361 

415 

.467 

.518 

.567 

.615 

20 

21 

104 

.173 

234 

.291 

.345 

.397 

.448 

.497 

.544 

.590 

21 

22 

.0994 

.166 

.224 

.279 

.331 

381 

.430 

.477 

.523 

568 

22 

23 

.0953 

.159 

.215 

.268 

.318 

.366 

.413 

.459 

.503 

.546 

23 

24 

| .0915 

.153 

.207 

.258 

.306 

.352 

.398 

.442 

.486 

.526 

24 

25 

j 0880 

.147 

.199 

.248 

.295 

.340 

.383 

.426 

.467 

.508 

25 

26 

! .0847 

.142 

.192 

239 

,284 

.328 

.370 

.411 

.451 

.491 

26 

271 .0817 

137 

.185 

.231 

.275 

.317 

.358 

.397 

.436 

.475 

27 

28! Q789 

132 

.179 

.223 

.265 

.306 

.346 

.385 

,422 

.459 

28 

29 

| .0763 

.128 

.173 

.216 

.257 

.297 

.335 

.372 

.409 

.446 

29 

31 

j .0739 

.124 

.168 

209 

.249 

.288 

.325 

.361 

397 

.432 

30 

31 

; 0716 

.120 

.163 

.203 

,241 

.279 

.315 

.350 

.385 

.419 

31 

32 

0694 

.116 

.158 

.197 

.234 

271 

.306 

.340 

.374 

.407 

32 

3S 

.0674 

.113 

.153 

.191 

.228 

203 

297 

.331 

.364 

.390 

33 

34 

0651 

110 

.149 

.186 

.221 

.256 

.289 

322 

.354 

.38c 

34 

31 

063/ 

.107 

.145 

.181 

216 

.249 

.282 

.313 

.345 

.375 

35 


262 





TABLE II— Continued 


n 

" °7J 

1 i 

3 ! 

3 J 
• 

c 

4 

5 ! 

C 

7 1 

8 

9 

n 

36 

.0620 

.104 1 

.141 

.170 

.210 

.242 i 

.274 

.305 

.336 

.366 

36 

37 

.0603 

.101 ] 

.138 

.172 

.205 

.236 ' 

.267 

.298 ! 

.327 

.357 

37 

38 

.0588 

, 0985 

.134 

.167 

.199 

.230 

,261 

,290 

.319 

.348 

38 

39 

. 0573 

.0961 

.131 

.163 

.195 

.225 

.251 

.283 

.312 

.340 

39 

40 

.0559 

.0938 

.128 

.150 

.190 

.220 

.218 

.277 

.305 

.332 

40 

41 

.0540 

.0016 

.125 

. 15(5 

.186 

.215 

.242 

.270 

.298 

.324 

41 

42 

.0533 

.0895 

.122 

.152 

.181 

.210 

237 

.264 

.291 

.317 

42 

43 

.0521 

.0875 

.119 

. 149 

.177 

.205 

.232 

.259 

.285 

.310 

43 

44 

.0510 

.0856 

.110 

.140 

.174 

.201 

.227 

.253 

.279 

304 

44 

45 

.0499 

.0837 

.114 

.142 

.170 

.190 

.222 

.248 

.273 

.297 

45 

46 

.0488 

.0819 

.112 

.140 

.160 

.192 

.218 

.213 

.268 

.291 

46 

47 

.0478 

.0803 

.109 

.137 

.163 

.188 

.213 

.238 

.262 

.285 

47 

48 

.0468 

.0780 

.107 

. 134 

.160 

.185 

.209 

.233 

.257 

.280 

48 

49 

.0459 

.0771 

.105 

.131 

.157 

.181 

.205 

.229 

.252 

.274 

49 

50 

.0450 

.0756 

.103 

.130 

.154 

.178 

.201 

.224 

.248 

.269 

50 

51 

.0441 

,0741 

.101 

.126 

.151 

.174 

.197 

.220 

.243 

.204 

51 

52 

.0433 

.0728 

.0991 

.124 

.148 

.171 

.194 

.216 

.239 

.259 

52 

53 

.0425 

.0714 

.0973 

.122 

.145 

.168 

.190 

.212 

.235 

.255 

53 

54 

.0417 

.0701 

.0956 

.120 

.143 

.165 

.187 

.208 

.230 

.250 

54 

55 

.0410 

.0689 

.0939 

.117 

.140 

.162 

.184 

.205 

.227 

.24(5 

55 

56 

. 0403 

.0677 

.0923 

.115 

.138 

.159 

.180 

.201 

.223 

.242 

56 

57 

.0396 

.0666 

.0907 

.113 

.135 

.157 

.177 

.198 

.219 

.238 

57 

58 

.0389 

.0654 

.0892 

.112 

,133 

.154 

.175 

.195 

.216 

.234 

58 

5S 

.0383 

,0643 

.0877 

.110 

.131 

.152 

.172 

.191 

.212 

.230 

59 

60 

.0376 

.0633 

.0863 

.108 

.129 

.149 

.169 

.188 

209 

.226 

60 

61 

.0370 

.0623 

.0849 

.106 

.127 

.147 

.166 

.185 

.206 

223 

61 

62 

.0365 

.0613 

.0836 

.105 

.125 

.145 

.164 

.183 

.203 

.219 

62 

63 

.0359 

.0603 

.0823 

.103 

.123 

.142 

.161 

.180 

.200 

.216 

63 

64 

.0353 

.0594 

.0810 

.101 

.121 

.140 

.159 

.177 

.197 

213 

64 

6E 

.0348 

.0585 

.0798 

.0999 

.119 

.138 

.156 

.174 

.194 

.210 

65 

66 

.0343 

.0577 

.0786 

.0984 

.117 

.136 

.154 

.172 

.191 

207 

66 

67 

.0338 

.0568 

.0775 

.0970 

.116 

.134 

.152 

.169 

.188 

.204 

67 

68 

.0333 

.0560 

.0764 

.0966 

.114 

.132 

.150 

.167 

.185 

.201 

68 

69 

.0328 

.0562 

.0753 

.0943 

.113 

.130 

.148 

.165 

.182 

198 

69 

70 

.0324 

.0544 

.0743 

.0930 

• 111 

.128 

.146 

.162 

.179 

.1951 70 


263 



TABLE II —Continued 



254 
















































































TABLE 1\—Continued 







c 








0 

i 

2 

3 

4 

5 

6 

7 

8 




Sjjf 

ipise 

jSBfP 



B$B8 

Bpf! 

.109 

.120 

.131 

106 



RIer 

So 


.0733 


|||| i! 

.108 

.119 

RhS 

107 




5 



.0842 


.107 

.118 

.128 

108 




iffiS 

sp ?.( 


.0834 


.106 

.116 

.127 

109 

|| 

Eg 


.0477 

.0597 


.0827 

|||j 

.105 

.115 


110 

hi 


.0346 




Bplfl 


.104 

.114 

.125 

111 

112 



.0468 

.0587 

I .0701 



.103 

.113 

.124 

112 

113 

WmM. 



.0582 

■Ipp 

P 


.102 

.112 

.123 

113 

114 



.0460 


PM 

|R 


.101 

.111 

.122 

114 

115 

.0198 

;; 

,0456 

.0572 

R x ;, 

pj 

1 


.111 

.121 

115 

116 

BfcfJ 

.0331 

.0452 

.0567 


.0785 

.0890 

.0994 


.120 

116 

117 



.0449 

.0562 

Rpr 

.0778 

.0883 

.0986 


.119 

117 

118 

Riff? 

R'jw 

.0445 

.0567 

Rnp 

Mm 

.0875 

.0977 

.108 

.118 

118 

119 



.0441 

.0553 

RIM' 

mm 

.0868 

.0969 

.107 

.117 

119 



1 

.0437 

.0648 

1 

n 

.0861 


.106 

.116 

120 

121 

HR! 

1 

.0434 



.0753 

.0854 

■ ' - 

m 

,115 

121 

122 

■ ' K* 

Bfew 


Pp« 

P 



i ■ 


114 

122 

123 


P 

.0427 


Rj'v- 

.0741 

.0841 

Ejf; 


.113 

123 

124 

RT? 

Rf , f? 

.0424 


RM 

s 



.103 

.112 

124 

125 

Eg 

Ek| 


.0527 

Eg* 

1 

'S 

I 

.102 

.111 

125 

126 



.0417 

.0623 


Bp 

tiif p! 


.101 

.110 

126 


BvJi 

M 

.0414 

.0619 

.0620 

P 

R 

Ey 


.110 

127 

128 

Rife 

1 


.0515 


(p 

P 

I 

R 

.109 

128 

■jlrli 

Rtfi! 

P 

.0407 

.0511 

.0610 



Rlil 

1 

108 

129 


K| 


Egg. 

.0507 

.0606 

1 

m 

a|i& 

1 

.107 

130 


1 / 


PfSt 

.0503 


.0696 

,0790 

.0882 

1 

.106 

131 


KI g 


.0398 

.0499 


,0691 


.0876 

1 

.105 

132 

133 

■Rf 



.0495 


■ 

1 

.0869 

I 

.105 

133 

134 

M 

RJ3jj 

.0392 

HIM 

P§Bp 

Kiif 


,0863 

,0952 

.104 

134 

135 


■ 

.0389 

^RiBT*T» 

Els 



.0857 

.0945 

.103 

135 

m 

1 

1 

.0387 

.0485 

.0579 


.0762 



.102 

136 

137 

Bl&I 

■ l 

.0384 

MS 

Hy$| 


.0756 

P 

KJ 

.102 

137 


K' 3? 

.0279 

.0381 

mm 

■ 



1 

1 

.101 

138 


■Pil 

■ 


8 


Bp|g 

.0745 

P 

RSfl? 

.100 

139 

140 

I 

■Mg 

Kii 

Ki 


in 




!.0996 

140 


255 





































































































































































256 


Fit AN K E. GRUBBS 


TABLE II —Concluded 







C 






n 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

141 

.0162 

.0273 

.0373 

.0468 

-0558 

.0648 

.0735 

.0821 

.0905 

.0989 

141 

142 

.0161 

.0271 

.0370 

.0464 

.0555 

.0643 

.0730 

.0815 

.0899 

.0982 

142 

143 

.0160 

.0269 

.0368 

.0461 

.0551 

.0639 

.0725 

.0809 

.0893 

.0975 

143 

144 

0159 

.0267 

0365 

.0458 

.0547 

.0635 

0720 

.0804 

.0887 

.0969 

144 

145 

.0158 

0266 

.0363 

.0455 

.0544 

.0630 

.0715 

.0798 

.0881 

.0962 

145 

14G 

.0156 

.0264 

0360 

.0452 

0540 

,0626 

.0710 

.0793 

0875 

.0956 

146 

147 

0155 

.0262 

.0358 

.0449 

.0536 

,0622 

.0705 

.0788 

.0869 

.0949 

147 

148 

.0154 

.0260 

.0356 

.0446 

,0533 

.0618 

.0701 

.0783 

.0863 

.0943 

148 

149 

.0153 

0259 

0353 

.0443 

.0529 

.0614 

.0696 

.0777 

.0858 

.0937 

149 

150 

j .0152 

.0257 

.0351 

.0440 

.0526 

0610 

.0692 

.0772 

.0852 

.0931 

150 


TABLE III 


(Based on Poisson approximation to the binomial distribution ) 


Acceptance Number 

Values of a t = np t for which 
P(c, a t ) = .95 

Values of a t =• np t for which 
P(c, a,) = ,10 

0 

.05129 

2.303 

1 

.3554 

3.890 

2 

.8177 

5.322 

3 

1 36G 

6.681 

4 

1.970 

7.994 

5 

2.613 

9.275 

6 

3.285 

10.53 

7 

3.981 

11.77 

8 

4.695 

12.99 

9 

5.425 

14.21 

10 

6.169 

15.41 

11 

6.924 

16.60 

12 

7.690 

17.78 

13 

8.464 

18.96 

14 

9.246 

20.13 

15 

10.04 

21.29 


REFERENCES 

[1] Catherine M Thompson, "Tables of percentage points of the incomplete beta func¬ 

tion," Biometrika, Vol. 32 (1944), pp. 151-181. 

[2] Maxine Merrinqton and Catherine M. Thompson, “Tables of percentage points 

of the inverted beta (F) distribution," Biomelnka, Yol. 33 (1945), pp 77-88 

[3] Paul Peach and S. B. Littaxjer, "A note on sampling inspection,” Annals oj Math. 

Slat., Vol. 17 (1946), pp. 81-84. 



ON THE RANGE-MIDRANGE TEST AND SOME TESTS WITH BOUNDED 

SIGNIFICANCE LEVELS 1 

By John E. Walsh 
The RAND Corporation 

1. Summary. This paper is divided into two parts. The significance tests 
investigated in Fart I concern the population mean and are based on the quantity 

[(sample midrange)-(hypothetical mean)[/(sample range). 

The case in which the observations arc a sample from a normal population is 
considered in detail. The tests investigated are summarized in Table I. These 
tests are found to be very efficient for small samples (see Table 4, power efficiency 
is defined in section 3). An investigation of several extremely non-normal 
populations using the values of Da obtained for normality indicates that the 
significance level of the range-midrange test is not very sensitive to the require¬ 
ment of normality for small samples (see Table 6). Also the tests of Table 1 
can be applied without computation through the use of an easily constructed 
graph (see section 4). These properties suggest that the range-midrange test is 
preferable to the Student t-test and the analogue of the Student 1-test using the 
sample range (see [I] and [2]) whenever the Bample size is sufficiently small. 

Use of the range-midrange test for the case of normality was proposed by E. S. 
Pearson in [3], where properties of the test wore experimentally investigated 
for the normal and certain non-normal populations. 

In Part II several significance tests for the mean are developed which have a 
specified significance level for the case of a sample from a normal population 
but whose significance level is bounded near the specified value under very 
general conditions, one of which is that the observations are from continuous 
symmetrical populations. Some of those tests are range-midrange tests. Table 
2 contains a summary of the tests and their properties (i< = fth largest observa¬ 
tion, i = 1, ■ • • , n; conditions ( D ) are given in section 7). 

PART I. THE RANGE-MIDRANGE TEST 

2. Introduction. In 1929 E. S. Pearson proposed using the range-midrange 
test for the case of a sample from a normal population (see [3]) and experi¬ 
mentally investigated some of its properties for sample sizes of 5 and 10 and 
significance levels of 2% and 10% (symmetrical tests). Using the constants 
(corresponding to the D a in this paper) determined for the case of normality, 

1 This paper was presented to a joint meeting of the Institute of Mathematical Statistics 
and the American Mathematical Society at New Haven, Conn, in September, 1947. The 
results presented in this paper were obtained in the course of research conducted under the 
sponsorship of the Office of Naval Research. This research was performed while the 
author was at Princeton University. 


267 



258 


JOHN E. WALSH 


significance level and power function properties of these four tests were experi¬ 
mentally investigated for several non-normal populations. The results of this 
empirical investigation indicated that the range-midrange test is very efficient 
for normality and not very sensitive to the assumption of normality if the sample 
size is sufficiently small. 

This paper piesents an analytical investigation of properties of the range- 
midrange test for n - 2, 3, ■ • • , 10 and a wide range of significance levels. 
The results of this investigation confirm the contention that the range-midrange 
test is very efficient for normality and small samples; also an analytical investiga¬ 
tion of how the significance level changes for the case of certain extremely 
non-normal populations furnishes results which agree with the contention 
that the range-midrange test is not very sensitive to the requirement of normality 
for sufficiently small samples. 

In most cases the results presented in this paper are not directly comparable 
with those obtained by Pearson. It was possible, however, to obtain values of 
Da, (a = 5%, 1 %; n = 5, 10), from the results presented in [3]; these values 
were found to be in close agreement with the corresponding values of Table 5. 

3. Efficiency of range-midrange. The purpose of this section is to use the 
relations derived in section 6 to determine the power efficiencies of tests A, B 
and C (see Table 1) for a - 1%, 5% and n = 2, • • • , 10. To do this the method 
of defining power efficiency given in [4] and [5] will be used. As shown in [5], 
it is sufficient to consider only test A; for any fixed n and «, tests A, B and C all 
have the same power efficiency (note that the significance level of teBt C is 2«). 

For a normal population (unknown variance) the most powerful test of the 
one-sided alternative p < m, is the appropriate Student i-test. The procedure 
used in determining the power efficiency of test A consists in first computing the 
power function of test A for the given values of n and a; then the sample size 
of the corresponding Student f-test at this significance level is varied until the 
power function of the t-test is approximately equal to that of test A. The size 
sample (not necessarily integral) thus obtained for the t-test divided by n is 
called the power efficiency of test A for the given values of n and a. Intuitively 
the power efficiency of a test measures the percentage of the total available 
information per observation which is being utilized by that test. 

Table 3 contains values of the power function for test A. These values were 
computed from equation (3) of section 6 by approximate integration. 

The corresponding values of the power function for the Student i-test were 
found by using the normal approximation given in [6]. This approximation 
was used for fractional degrees of freedom. The sample sizes considered as 
well as the resulting power function values are listed in Table 3. A comparison 
of the power function values for the two types of tests furnishes the approximate 
power efficiencies listed m Table 3. 

For n = 2, test A is itself a Student i-test. The power efficiency is therefore 
100% for that sample size. This combined with Table 3 furnishes power 



RANGE-MIDRANGE TEST 


259 


efficiencies at the 1% level for n =» 2,0,8,10 and at the 5% level for n = 2, 6, 10. 
The approximate power efficiencies given in Table 4 for other values of n were 
obtained from these values by graphical interpolation. 

Table 4 shows that the power efficiency for a = 1% is very good for n < 8, 
while for a =* 5% the efficiency ia good for n < 6. 

TABLE 1 


Summary of range-midrange tests 


\ 

Definitions 


Tests 

Signifi- 

Accept 

If 

Level 

Test based on sample of size n, (2<n< 

10), from an arbitrary normal popula¬ 
tion. 

Xi = smallest sample value. 

x n = greatest sample value. 

(A) 

n<n o 

b 

A 

1 

£ 

a 

a = the mean of the normal population, 
go 531 given hypothetical mean value to be 
tested. 

£ _ (sample midrange)-(hypothetical mean) 

(B) 


D>D a 

or 

(sample range) 

= [(s„ -f Xi)/2 - no]/(*» - Xi), 

D a = constant depending on n and a. 

Values of a versus D a for 2<n<10 and 
a = 5%, 2.5%, 1%, 0.5% are given in 
Table 5. 





(C) 

\D\>D a 

2a 


4. Construction of graph. In most problems to which a test of the type 
developed in this paper would be applied, the values of the sample can be con¬ 
sidered to have practical lower and upper limits, say a and b. For example, in 
many situations zero is a lower limit for the sample values. From a practical 
viewpoint these limits on the sample values do not contradict the assumption 
that the population is normal, Bince the area under that part of the normal 
distribution which lies outside the interval (a, b) can be considered negligible. 
Thus, since Pr(u/y 2 ^ w) - Pr (u ^ vw), test A can be restated in the form 
Accept g. < yu 0 if the sample point (xi, x n ) falls in the region (/l) of the xi , x„ 








TABLE 2 

Some one-sided and symmetrical tests with hounded 


260 


JOHN E. "WAIjSII 




RANGE-MIDRANGE TEST 


261 


TABLE 3 

Power Junction values for lest A 


Type 

Tost 

Sample 

Approx, 

Efficiency 

Significance 

Approximate Values of Power Function 

Siae 

Level 

m 

i - 1 

8 = li 

5 2 

5 = 21 



% 

■ 






t 

5.4 


I 

.244 

.607 

.886 

.969 


A 

G 

00 

mm 

.259 

.599 

.868 

.967 


t 



B9 

.333 


.971 



A 

an 

75 

91 

.351 

9C77iV 

.962 



t 

5.88 


.01 

■jg| 

.248 

.551 

: 

.820 

.957 

A 

6 

98 

.01 


.271 

.568 

.809 

.935 

l 

7.2 


— 


.371 

.749 

.949 


A 

8 

90 

■£ 

m 

.389 

.728 

.923 


t 

m 


.01 

.108 

.453 

.832 

.976 


A 

1 

80 

.01 

.124 

.462 

.814 

.963 



TABLE 4 

Power efficiencies of tests A, B and C for a = 6%, 1% and %<n<10 

I * 


a 

2 

; 3 

4 

1 5 

6 

7 

8 

0 

10 

.01 

100% 

99.7% 

99.4% 

99% 

98% 

96% 

90% 

85% 

80% 

.06 

100% 

98.5% 

96% 

93.5% 

90% 

86.5% 

82.5%: 

78.5% 

75% 


TABLE 5 

Approximate values of D„for a — 5%, 8,5%, 1%, 0A% and 2<n<10 


n 


a 

2 

3 

4 

6 

6 

7 

s 

9 

10 

0.5% 

31.83 

3.02* 

1.37* 

.85* 

.66 

.55* 

.47, 

.42, 

.39* 

1% 

15.91 

2.11* 

1.04* 

.71 

.56* 

.47, 

.42* 

.38 

.35* 

2.5% 

6.35 

1.30 

.74 

.52 

.43 

.37, 

.33 

.30 

.27, 

5% 

3.16 

.90* 

.55s* 

.42, 

.35* 

.30 

.26, 

.24 

.22,* 


* These values of D a were verified directly by substitution and integration. 
The remaining values of D a for 3 g n g 10 were obtained from these and other 
values of D a , (« db .006, .01, .026, .06), by graphical interpolation. 



































262 


JOHN E. WALSH 


]plane defined by 

(1/2 + D a )x n + (1/2 - D a )xi < m, x n > Xi, a < xi,x n < b. 

TABLE 6 

Effect of non-normality on the significance level of the range-midrange test 


Significance Level 




.010 064 ,039 , 018 . 010 .128 . 078 . 030 020 


.0096 .063 .033 . 017 , 0096 ,100 . 006 . 034 .0192 


.015 . 0094 .080 .058 030 . 0188 


.0031 036 .017 .0063 , 0031 .072 . 034 . 0126 . 0062 


. 0024 . 043 010 .0055 .0024 .086 . 032 . 0101 .0018 


.0027 .095 .020 . 0059 . 0027 .100 052 . 0118 0054 


073 . 050 .119 .104 . 073 . 050 . 238 . 208 .140 .100 


.002 061 .055 015 .124 .122 .110 .090 


031 .029 . 031 .031 .031 .029 . 002 . 002 . 002 . 058 


.059 .035 .172 .116 .062 .036 


0016 0007 .144 .104 . 005 042 157 .109 . 067 . 043 


0000 .122 .096 . 061 .046 .130 .102 . 062 . 040 


.030 .017 .131 .080 .038 .021 


.083 055 .031 .018 .114 .071 .038 021 


,0031 .068 


.019 . 090 . 065 . 032 . 020 


027 014 , 0053 , 0020 .112 072 037 . 021 .139 , 080 . 042 . 024 


.0019 , 009 .007 . 039 .024 ,123 . 078 . 043 , 026 




Likewise test B can be restated as 
Accept ii > pa if (x,, x n ) falls in the region (B) defined by 

(1/2 — D a )x n + (1/2 + D a )xi > mo , x n > xi, a < xi, x n < b. 

















































































































































































RANGE-MIDRANGE TEST 


263 


Test C now becomes 

Accept ii r* )Mi if (xi, z«) falls in either of the regions (A) or (B), 

Figure 1 (i) contains a schematic diagram of the regions (A) and (B), Test A 
can be applied by constructing a graph of the region (A) and giving the instruc¬ 
tions to accept /i <iia if fo, z n ) falls in (4). Similarly for test B and region (B) 
Test C is applied by constructing a graph of both (4) and (B) and accepting 
p ?£ no if (ft, x„) falls in either (A) or (B). b 

Frequently it is desirable to simultaneously consider more than one significance 
level. This can be accomplished in the manner indicated by Figure 1 (ii). 

6. Effect of non-normality on significance level. It has been Bhown that the 
range-midrange teat compares very favorably with the Student f-test for suffi¬ 




ciently small samples and normality. In practice, however, it may happen that 
normality is assumed for cases in which the population is not even approximately 
normal. Although this represents an error in judgment on the part of the 
person applying the test, such situations will undoubtedly occur if the range- 
midrange test is used very frequently. The purpose of this section is to 
investigate the effect of non-normality on the significance level of the ranger 
midrange teBt when the values of D„ based on normality are used. The cor¬ 
responding effect of these non-normal populations on the significance level of 
the f-test was not considered because of computational difficulties; however the 
effect of some other non-normal populations on the significance level of the 
<-test was experimentally investigated by Pearson in [3], The results of this 
empirical investigation and of later investigations shows that the significance 
level of the f-test is not very sensitive to the requirement of normality for small 
samples. 

Six populations were chosen for investigation. Three of these populations are 



264 


JOHN E, WALSH 


symmetrical while the remaining three are strongly asymmetrical. These 
particular populations were considered because their probability density func¬ 
tions have a wide variety of different shapes; also because the significance level 
of the range-midrange test can be computed in closed form for these populations. 

The populations investigated are defined by their probability density functions. 
Table 6 contains a list of the probability density functions considered along with 
the resulting significance levels for the range-midrange teat. The cases in¬ 
vestigated are n = 3, 4, 5 and a = 5%, 2.5%, 1%, 0.5%. Larger values of 
n were not used because of computational difficulties. The situation of n « 2 
was not considered because the f-test and the range-midrange test are identical 
for this case. The significance levels of Table 6 were computed by making 
direct application of (1) and (2) of section 6. 


6. Significance level and power function derivations. The purpose of this . 
section is to present derivations of the significance level and power function 
expressions which were used in the preceding sections. First a general probabil¬ 
ity expression will be evaluated. Direct applications of the results obtained 
for this expression yield the required significance level and power function 
relations. 

Let xi and x n be the smallest and largest values, respectively, of a sample of 
size n drawn from a population with probability density function f(x) The 
non-zero probability range of this population is y < x < /9. Also let three 
constants Ci, c„, Co, (ci + Cn - 1), be given and consider the value of 

Pr (cixi + c„x n < Co); where M(z) = [ f(y) dy. 

J— oo 

Using direct methods it is found that the value of this expression is given by 
[lkf(co)] B if Ci = 0. 


M 




(oq— fl 17 )/e rt 


(1) 1 - [1 - M(co)]" 
0 


if 0 < ci < 1, Go < 7 

| ~M(V) - M 'fCV) dV 

if 0 < ci < 1 , c 0 > 7 . 
if Ci ■=> 1 

if ci > 1, Co < min \y, cry + c„0]. 


1 


-n I \m{V) - M fa - f( V ) dV 

- M ^ n Cn7 ) if «1 > lj Cir + c„/3 < c„ < 

- n f e \M(v) - m — — - 


m dv if ci > i, 


Co > 7- 



RANGE-MIDRANGE TEST 


265 


The value of Prfaxi + c„x n < c) for e, < 0 can be obtained from the above 
results for ci > 1. It is easily shown that 

(2) Prfci.Ti + c„x„ < Co) = 1 — Pr(ci 2 /! — c' n y n < cq), 

where 

Cl = 0 n , Cn = Ci , Co = —Co , 

and yi , t/„ are the smallest and largest values, respectively, of a sample of size n 
drawn from a population with probability function g(y) = f(-y). Thus if 
ci < 0, ci - c n > 1 and obvious modifications of the results for Ci > I will 
furnish the value of Pr(ciVi + c'„y n < co). 

The above general results were used in section 5 to investigate the effect of 
non-normality on the significance level of the range-midrange test 
Now consider the case in which the n sample values are drawn from a normal 
population with mean m and variance o- 3 . Then, for test A, 

Power Function = Pr{(l/2 - D a )x x + (1/2 + D a )x H < w ) 

= Pr((I/2 - DJzi + (1/2 + D„)z„ < «}, 

where 


Zl « (X! — n)/a, z* =» (x« — n)/<r, 5 = (no — n)/a. 

Using the above results with 

'<*> " VTr *- 

it is found that the power function for test A is 


f 1 -»/;[■ 


N(V) - N 




{ * v r - + flf >V }]' MdV if < 1/2; 
(3) [W(S)] n if Da = 1/2; 

n £ [w> - y { * - AV) dr, if A. > 1/2. 

The value of Z)„ (for giveu n) corresponding to a specified significance level a 
for test A is obtained by solving the equation 


(4) 


a - Pa(0), 


where P A ( 5) is the power function for test A. From symmetry and the fact that 
test C is a combination of tests A and B, test B has significance level a and test 
C significance level 2« for this value of D« . 

For n «= 2, test A becomes a Student i-test with one degree of freedom if D„ 
is replaced by t*/2. The relation D a = t a /2 gives an easily applied method of 
computing D a for this case. 

Approximate values of D a for a = 5%, 2.5%, 1%, 0.5% are contained in 



266 


JOHN E. WALSH 


Table 5 for 2 <n< 10. For 3 < n < 10, these values were obtained from (3) 
and (4) by approximate integration and interpolation. For n = 2, the relation 
between D a and t a was used. 

PART II. SOME TESTS WITH BOUNDED SIGNIFICANCE LEVELS 

7. Introduction. In this part some significance tests (for the mean) are 
derived which are based on the assumption of a sample from a normal population. 
These tests have the property that the significance level is bounded near the 
value for normality under very general conditions. Those conditions are 

{ (a) The observations used for a test are independent. 

(b) Each observation comes from a continuous symmetrical population 
with mean n. 

It is to be emphasized that no two observations are necessarily drawn from 
the same population 

The bounded significance level tests developed are. summarized in Table 2. 
These tests can be used to supplement the tests presented in [5] for n < 9, where 
the tests of [5] do not furnish a very wide variety of suitable significance levels. 

8. Outline of derivations. Let us consider the range-midrange test for the 
more general situation in which the set of independent observations used are from 
arbitrary but fixed populations satisfying conditions (Z>). Let 1)„ be redefined 
so that the resulting test A has significance level a. Then it is easily seen that 
D a is a monotone decreasing funotion a, Thus the significance level of the 
modified test A will always be less than or equal to (1/2)" if D a > 1/2. The 
significance level bounds for the tests n = 4, a = 5%; n =» 6, a « 2,5%; n « 0, 
« = 1%» n - 7, a = 0.5% of Table 2 were obtained from this relation and 
obvious significance level relations among tests A, B and C. 

The significance levels (for normality) for the tests n = 5, a <= 5%; n = 6, 
a = 2.5%; n - 7, a = 1%; n = 8, a = 0.5% were obtained by approximate 
integration of the expression derived for Pr[(l/2 + c)x„ + (1/2 — c)x„-i < n], 
(0 < c < 1/2), for several values of c and then graphical interpolation (here a 
is the one-sided test significance level). The significance level bounds were 
determined from 

(1/2)" = Pr(x„ <»)< Pr[(l/2 + c)x n + (1/2 - c)x n ^ < pj 

< Prf(l/2)(x„ + *„_,) < M ] - (1/2)"- 1 

The significance levels for the tests n = 8, « = 1%; n = 9, a « 0.5% were 
obtained by considering the relations 

Prjmax [s„_,, (x n + *»-i)/2] < p) = (1 + i)(l/2)", (i = 0, 1, 2, 3), 
and applying linear interpolation to find a value c, (0 < c < 1/2), such that 



range-midrange test 


267 


Prfmax [vi, 0,5x„ + cx«-i + (1/2 - c)x n -J < n\ 1ms the desired value. 
The significance level bounds were found from 

Pr((l/2)(x„ + r,-i) < m| < Pr(raax[x n _i, 0,5x n + cx nA + (} - c)x n _J < 

< Pr(max[vi, (1/2) (x n + x„_ 2 )] < //]. 

The derivation of the power efficiencies listed in Table 2 will not be considered 
here. Detailed derivations can be found in [7j. 

REFERENCES 

[1] J F. Daly, “On the use of the sample range in an analogue of Students’ t-test,” Annals 

oj Mail Slot., Vol, 17 (1948), pp. 71-74. 

[2] E. Lord, “The use, of range in place of standard deviation in f-test,’’ Biomtriba, Vol 34 

(1947), pp. 41-87, 

[3] E. S. Pearson, "The distribution of frequency constants in small samples from non- 

normal symmetrical and skew populations "Biomlnka, Vol 21 (1929),pp 280-286 

[4] John E. Walsh, "On the power function of the sign test for slippage of means, 11 Annals 

of Math, Slot., Vol. 17 (1946), pp. 358-382. 

[fi] John E. Walsh, “Some sipifieance tests for the median which are valid under very 
general conditions,” Annals oj Mali Slat,, Vol. 20 (1949), pp, 64-81, 

810-811, Submitted for publication in Annals of Math , Slat 

[6] N, L, Johnson and B. L Welch, “Applications of the non-central ^-distribution,” 

Biomtrik, Vol 31 (1940), p, 378, 

[7] John E, Walsh, "Some significance tests for the median which are valid under very 

general conditions, 1 ’ unpublished thesis, Princeton University Library, Princeton, 
N, J. 



ASYMPTOTIC STUDENTIZATION IN TESTING OF HYPOTHESES 

By Herman Chernoff 1 
Cowles Commission for Research in Economics 

1, Summary. A method suggested by Wald for finding critical regions of 
almost constant size and various modifications are considered. Under reasonable 
conditions the sth step of this method gives a critical region of size a + R,($) 
where 8 is the unknown value of the nuisance parameter, R, ( 8 ) - 0(N~~ ,n ) and N 
is the sample size. The first step of this method gives the region which is 
obtained by assuming that an estimate 8 of the nuisance parameter is actually 
equal to 8 . 

2. Introduction. The problem of nuisance parameters often arises in the 
testing of hypotheses in the following form: It is desired to construct a test of a 
hypothesis H so that the probability of rejecting II if it is true is equal to a, 
However the probability distribution of the data is not uniquely determined 
by H . Indeed, if the hypothesis is true then the observations have a distribution 
depending on a nuisance parameter B whose value is not known. Generally a 
critical region will have a size which depends on the value of 6 . Neyman has 
done considerable work on the problem of finding similar regions, i.e., regions 
whose size is independent of 8, 

Wald has suggested the following method of finding critical regions whose 
size is almost independent of 6 . Suppose that l is a statistic such that if 8 
were known then the critical region t < c,( 6 ) would be a good critical region 
for testing the hypothesis H, Suppose also that 8 is an estimate of 6 and that 
g(t, 8 1 8 ) represents the joint distribution of l, 8 under II when 6 is the value 
of the nuisance parameter. Then consider the regions 

t < c,(8) where PrU < c t (8)} = a independent of 8; 

i < 0 ,( 8 ) + a( 8 ) “ Pr(t - c,( 8 ) < &(8)J = a independent of 8; 

t < c,(8) -f- • • -f c,(8) “ Prji — Ci(8) • ■ • —c,_j (8) < c,(6)I = a 

independent of 8, 

Under the assumption that 8 is close to 8 it is reasonable to expect that 
Pr{t < Ci($)j would be close to a, It might also be expected that 
Pr{< < ci (8) + c 2 (8) j would be even closer to a. 

This method has been shown to have good properties when considered from 
the asymptotic point of view. Suppose that l, 8 are two sequences of statistics 

1 This paper is based on a dissertation written under the supervision of Professor Abra¬ 
ham Wald and submitted as partial fulfilment of the requirements for Ph.D. in the Gradu¬ 
ate Division of Applied Mathematics of Brown University. 

268 ' 



ASYMPTOTIC STUDENTIZATION 


269 


(depending on N , the size of the sample or an analogous variable) with distribu¬ 
tion represented by g(t, b \ 6 ) where N is understood to be present. Then it has 
been shown that under reasonable conditions, with modifications for the sake of 
calculation, 

I Prl< < ci((J) + • • ■ + e.(d)} - a | = 0 {N~ l/a ) 

The statement of the theorem presenting this result will be given in section 4. 
It has also been shown that if roughly speaking 8 is distributed almost sym¬ 
metrically about 0 , the above result may be obtained in half the steps, i.e., 

I Pr{l < ci(d) + • • • 4- c.(0)) - a \ = 

It is true that under relatively weak conditions and for fixed N it is possible for 
any « > 0 to obtain a function h(6) such that j Pr{t < h{b) j — a | < e. However 
such a critical region can have very poor properties from the point of view of the 
alternative hypotheses especially if h(b) is a very wildly oscillating function. 
On the other hand this objection does not apply to Wald’s method for large N 
because 

| c[ r) ( 6 ) | < M r = 0, 1, • • • , a; 

I c\'\ 6 ) | < MN~ m r-0,1, •••,«-1; 

I ci r V) I < r = 0, 1, 

and hence Ci(b) -f- • • • 4- c 2 (b) is almost constant over “that small range in 
which b will probably fall.” 

In the above it has been implied that 0 is a one dimensional variable. However 
the results are easily extended to the case where 9 is a ft-dimensional variable. 

The direct application of the method is often quite difficult because of the 
calculations involved. Modifications can be applied which simplify the cal¬ 
culations. Such modification usually consist of changing the c r (6 ) by a small 
amount provided the remainder is simple and “well behaved.” A case where 
considerable simplifications can be made is that where gi(t | b, 8), the conditional 
distribution of i, can be expanded in a Taylor Expansion, 

gi(t | bfi ) = 0 i(ci( 0 ) 1 0, 9) + (t ~ ci(9)) -Jy -f- (0 — 9) 

+ " * +- Ciie))i{S ~ e)W ai(t> 1 

where the partial derivatives “behave.” This case will be described in detail in 
section 3, and an example previously treated by Welch (see [1]) will be discussed 
in section 4. 

Another case where simplifications often arise is the asymptotic case, that is 
the case where g(l, b | 9) has an asymptotic expansion. The asymptotic case 



270 


HERMAN CHERNOFF 


may also be regarded as an extension of the following partition principle which 
is very useful. If g(t, b \ 8) = g a (i, b | 9) + h(t, b \ 8) and JJ\ h | dtdb < MN~‘ n 
and if <p(b) is such that 

I r" fvW I 

\ de dt g t (t, b I e) - <x I < MN~‘'\ 

then | Pr[< < <p(b) j — a. \ < MN~’ n . Thus our theorems apply to g(t, b\ 9) 
if 9 ~ do + h where go has sufficient differentiability properties. 


3. The Taylor expansion treatment. Let g(lj\6) = q 3 (L 1 6, 8)g 2 (b | e) where 
gi is the conditional density of t given 8 and g 2 (b j 9) is the marginal density of b. 

g 3 (t | 9) = j_JLbg(t, b | 6) is the marginal density of l. In what follows we shall 

use as a generic bound. Thus the statement f(t, 9) < M(6i, d 2 ), 8 L < 8 < 8 2 , 
means that there is a constant M depending on (0 X , 8 2 ) and independent of 
t, 9, N so that f(t, 6) < M(6 L , d 2 ) e 2 < 8 < & 2 . 

First we obtain ci(0) so that Pr{i < ci(0)) = a. 

Then we have 

Theorem 1. If for every finite interval (0i, 0 2 ), 


(i) 




< <?i(<, 8) < ft(0, | A| < A'(8i, 9i,N), P-0,1,...«, 


Bi < 8, 8 + A ^ 6i, 

where jf ^ Qt(t) dt < M(8 2 , d 2 ), G 2 and G 2 may depend on N, 0 2 , and 0, 


/n d 9»(t 1 8) , . 

W — a e^dT« — 18 conlm w>us in t, 8 and 


hounded in absolute value by M(C i , C t , 8i, 8 t ) for p + q < s, 8 2 < 0 < 9 2 , 
Ci < t < Ci ; 

1 

(m) ° < mcT,O i , e l ,8 i ) < I s ) f° r ft <e<B2 ) C l <t<C i ; 

(iv) 0 < « < 1, 

then Pr{t < cj(0)} = a defines ci(8) uniquely and so that 1 Ci M (8) I < M(8i, d 2 ) 
for p m o, 1, ■ • • , s 0x < 8 < 8i. 

Proof. Since g>(t \ 8) is positive, c,(0) is uniquely defined by condition (i). 
From this and conditions (i) and (ii) it follows that c[{8) exists and is given by 

« dt ~B ^ I ^ Ci(9)gi(.Ci(d) 1 8). 



ASYMPTOTIC STUDENTIZATION 


271 


We may continue in this fashion differentiating formally p < s times to get 
(2) £“ dt d -^~~ + • • • lci ik \e)] h ^ {ci(e) 1 e) 

+ ci v) {0)9i (ci(0) 1 6) = 0, ji , jt, * • ■, j\, h , • ■ •, 4, i + j < p. 

From the continuity and positiveness it follows that c tp) (0) is continuous. Since 

/ « 

dt <M{ 6 1 , 0a) it follows that there is a constant M(6 L , 0 2 ) so that 

no 

r * 

ft(0 dt < a, / Gi(t) dt < 1 — a. 

» Juih.h) 


Thus 


| Cl (0) i < M(tk , h). 

From (1) and condition (i) it follows easily that [ c[(8) | < M0i, 6 2 ). Similarly 
we obtain | Ci v} (6) | < M(Bi , 0a) for < 6 < . 

While the conditions (i) to (iv) suffice to insure the results of the theorem 
they are not necessary. It is often possible to obtain these properties of ci(0) 
in particular examples where gt(l, 6) does vanish at points so long as 0»(ci(0), 9) 
behaves well. 

Definition 1. v m 0) is an admissible function of order m(m < s, a fixed in 
advance ) if <p m 0) m Ci(d) + • • • + c n 0) where Pr{t < <a(0)} = a and 

(3) | cj p) (0) I < M(6i , V * 0,1, • • ■ , s + 1 - i, Si < 0 < 6i . 


Now let 

(4) Hx(6) « N w E0 - 6) k = N m £ 0 - 0)W$ I ») db and 


aP-Hl 

(5) G f 9 (0) = 0$ I l*~■ 


We have 
Theobem 2. If 

(i) Pr{t < ci(0)} - a, 0 < a < 1, and \ c[ T \e) | < M(0i, 0 2 ), 

0i ^ o < h,p - o, i, ••• 

(ii) i => 5(1V) = 0(1) is a function of N such that 



\%0 | 9) < M0 1 , 0 a )ir* /a , 01 ^ 9 < , fc * o, 1, • • • , s; 


aPTB 

(iii) 0i« | M) I < Midt , 0 2 ), p + g = s, 

I l - Ci(0) I < p, \b - 01 < S, 



272 


HERMAN CHEHNOFF 


where 

p = Max. | Ci (b) — ci(0) | + N a/2)+,! , 5? > 0, 8 i < 0 < 9 2 ; 

(iv) | #i P) (0) | < M(d! , ft) for v - 0, 1, • • • , a - h k = 1, • • • , s, 

ft < 6 < 6 ,-; 

(v) |G<£(*)| < Hf(ft, ft) /or / = 0,1, ,3 - p - q, 

V + ? < s - 1; 

(vi) p m ($) is an admissible function of order m < s, 
then 

(6) Prji < Vn {b)) = « + r ml (g)/T 1/! + • • • + r ra .(0)?T* /2 
where 

| ri P f(B) | < M{Qi , ft) for j> = 0, 1, • • • , s - j, / < s, ft < 8 < ft. 

Proof. Expand pi(i ] b, 6 ) in a Taylor Expansion about i = Ci(0), b = 9, 
with remainder terms of order s in l — ci( 6 ), b — 6 , and expand c,(0) about. 

b = 6 where the remainder term is of order s -(- 1 — i. Then for | b — 6 \ < S, 

we have 

(7) r lS) gi(l | b, 0) dt = P{(b - 9) j , C?\B), Q pt )l+ RN~' n , 

where P is a polynomial and | R | < M(ft , ft)2D [b — 8 )‘ 'l\H ,/2 for | b — 9 | < 5. 

i—0 

Integrating over | b — d \ < S, we use conditions (ii), (iv) and (v) and the 
theorem follows. By a similar argument we have 
Theorem 3. If 

(i) the conditions of Theorem 2 hold for each (61 , ft) so that 

— co < ft < ft < ft < ft < co 

and 

(ii) gi (ci(&) \ 6 ,e)> (i/M(ft , ft)) > o, &1 < e < ft , 

then the sequence 

vi(b) = ci(b)\ 

(g) = ci(b) - Tl ,x0)N~ 112 -, 

Vn (b) - <P^i(b) gi (cM\b,b) ’ 
is a sequence of admissible functions such that 
(a) Pr{< <? m (0)! = a + R(6)N~ ml \ 


m < s, 



ASYMPTOTIC STUDENTIZATION 


273 


where | R{6) j < Af(0i , &%) for fa < fa < Q < e 2 < fa . 

These theorems permit us to obtain and to calculate critical regions whose 
■size is asymptotically close to a. 

In Theorem 2, condition (ii) was much stronger than necessary. It may be 
relaxed if we define 


fh(6) - f N m g t (d | fl)(0 - 6) h dd, 

where 

Pr{] b - 0 [ > 6} < M(fa , <? 3 ) 7 r" 2 , g = s(N) = 0 ( 1 ). 

However this may complicate the calculations. 

The symmetric case arises when the first moment almost vanishes, i.e. 

( 10 ) | m p) ( 6 ) | < Mdk, p - o, 1,l, 01 < e < e, . 


In this ease we have instead of the sequence given in Theorem 3, the sequence 

— Ci($); 

(u) «f4) - cS) - ' ’ 




r n -u im S)N~ lm - 1) + r w _ llSm _ 1 (d)ir C2 "- l/2) 


which is a sequence of admissible functions such that 

Pr{< < *«($)} » a + r m , lm (e)ir n + • • ■ + r m ,.(0)N ~" 2 
I rifi(fl) | < M( 0 X , ft) ft < 5 < ft p - 0, 1, 


4. An example. The following example previously treated by Welch from a 
different point of view will furnish an illustration of the applicability of the 
theorems to the case where 0 is a k dimensional parameter. It will also serve 
as an example of an extended type of symmetry. That is, it has the property 
that | Hifh(O) | < M(6 i , df)N~ in , and hence, in the sequence (11), the r m , 2 «+i(0) 
terms effectively vanish thereby simplifying the calculations considerably 
We suppose that t is a normally distributed variable with mean m and variance 
t = Xicr? -f- ' • ■ + X*cr* where the X< are known positive constants, the a\ are 
unknown parameters each of which is independently estimated by s< where 
W,aV<r? has the x distribution with Nt degrees of freedom. It is desired to test 
the hypothesis that n = 0 so that the probability of rejecting the hypothesis 
if it is true should equal a. Under the hypothesis the joint density distribution 
of t, s?, • • ■ si , is given by 




274 


HEHMA.N CHEHNOFF 


where the momenta of s? - a\ = 0, - 8; are given by the coefficients of ul’/kl 
in the expansion about u = 0 of e _UI >(l — (2ua\fN,))~ s ' l2 \ 

Ih(°{) = 0, 

Jlitf) = 2ri ; 

H>(J) = 

Hiti) = (V + 2 JV 7 V! • 

We define ci(6) by Pr{Z < ci(tf)} = a where 

9 = (<rl, <r\, • • • , <r*) and $ = si), 


0i(9) = ci<r. 

Now a 2 (8) — a - Pr[c ( (fl) < t < Cj.(0)} may be computed within terms of order 
N7 2 by expanding 

ci0) i=S ci a + Ci £ ^ X,(s< - o'?) — Ci £ ^ ^j( 8 ? ~ “ °v) 


80 8 




whence 


a 2 (0) ~ a « jf • • • jf da? ■ • • del I[ g(s? | o? ]N{) £ Xi(«{ ~ <r<) 

~ E M»(«< - <r?) (s/ - &j) - ^ X) - <r?)(«/ ~ 


Thus 


and 


where 


*<*) = c -~~ £ xWtfr 1 


«»(«) = Pr{< < ct + £ X&N7 1 } - a + Od^iVT 5 ), 


£ X{S?, 


Further approximations become somewhat complex and sliould be carried out 
in a systematic fashion. 


5. Remarks. The range of application in practical statistical problems of the 
theorems of section 2 may be somewhat more limited than that of the original 



ASYMPTOTIC STUDBNTIZATION 


275 


method proposed by Wald. Concerning the original method, the following 
theorems have been established. 

Theorem 4. If 

(i) Pr{i < Ci(0)) = a, 0 < oc < 1, where | cP°(0) | < M(e t , Of), 0 t < 6 < 0 2 , 


(il) 


(iii) 


P = 0, 1, 


. 0 W g(t, 0 + A | 0 + A) 
1 dl‘dA <' 


| < G0, 6) for i + j < s - 1, Ci < t < C 2 


0i < 0, 0 + A < 02, | A | < A', where (?(0, 0) depends on Cx , Ct , 0i, 0 2 , JV, 

» 

and is integrable in & over (— », »); 


AMI?) 

dt'dA' 


< L(0, 0), i + j < s - 1, Ci < t < C 2 , 0! < 0 < 0 2 , 


w/iere £ L(0, 0) | 0 - 01* d0 < M(0i, 0,, Cx, C 2 )1W* /2 , jb = 0, 1, 

Civ) 0 < A(C X , Ct , 0i, 00 < A(0 < g,(t \ 0 ) < B(0 < B(Ci, C t , Ox , 00 < «o; 

01 < 0 < 02 ,Cx<t <Ci, 

.00 

I B(t ) dt < M(0x , 02); 

4) p(i, 0 | 0) > 0, 

then a sequence cx(b),cx(0),ct(b), • • •, ct (b), exists where c m (0) is uniquely defined in 
(0i, 0Ohj/Prfi — c?(0) — •• • — c£_i(0) < c m (0)) = a, and 

|<£ pl (0)| < Af(0i, 0Ofr <m “ 1)/s p = 0,1, ■ • • , s - m + 1, 0i < 0 < 02 
and <J(0) is any function so that 0 

|<£ l,) (0) - ci ,) (0) | < Mr' 2 /or 0i < 0 < 02 , p = 0, 1, • • • , s - m, 
and 

| cZ (P) (0) I ^ M(0i, 00- « < 0 < oo, p = 0, 1, ■ • ■ , s - m + r 

Finally for c m *(0) arbitrary within the above conditions, 

| Pr{£ - c*(0)-- cf(0) < 0) - a | < M(0x, 0Or" 2 for 0i < 0 < 0 2< 

The conditions on the derivatives with respect to A are natural because 
the intuitive approach to the method Beems to hinge on the assumption that 
g(t, 0 + A | 0 + A) changes gradually with respect to A “independent” of the 
value of N. This would not be true of g(t, 0 | 0 -f- A) for large N, 

The c* (0) where introduced in Theorem 4 because in practical examples it is 
usually found too difficult to compute c,(0) efficiently. On the other hand 
there are many alternative ways of obtaining functions with the properties 





276 


HERMAN CHERNOFF 


of the ct(d). The c 2 (0)„ e a (0) etc. mentioned in Theorems 1, 2, 3 play the role 
of the c*(0) in Theorem 4 with the exception of the condition on c?(0) for outside 
(0x, 0 2 ) The exception is due to the fact that the Theorems 1, 2, 3 correspond 
to the “infinite case.” Theorem 4 is applicable to those cases where one is 
willing to assume that 0 lies m (0i, 0 a ). It often happens that there is no such 
reason or that the conditions of the theorem hold only for every closed proper 
subinterval of (ft , ft) but not for ft < 0 < ft itself. In these cases we may 
apply 

Theorem 5. If 

(i) all of the conditions of Theorem 4 apply to every finite proper dosed subinlerval 
(0i, 0 a ) of (ft, ft) where (ft , ft) may he an infinite interval ; 

(ii) Prf| h - 0 | > S(iV)) < M{di, BiW 1 * for ft < 0i < 0 < ft < ft , where 
S(N) ~ 0(1) unless ft or ft is finite, in which case <5 (AO — o(l), then a 

sequence c}0), ofd), c*(6), > • • , c*0), exists, where e„(0) is uniquely defined in 
(ft, ft) hy Pr[t - Ci* (if) ~ c 2 *0) - < c m (0)) = a, so that for every 

(0i, 00, 

| <£ (p) (e) | < M(g l , di)isr^- m ifPi < 01 < 0 < ft < ft, 

p = 0, 1, • • • , 8 — TO + l' 

and for c^(0) arbitrary within the above conditions 

I Pr|t < ct0) + • • • + <£(S)} - a | < M(ft , ft)ir m ' 2 

if ft < 0i < 0 < 02 < ft, m < s. 

Essentially this theorem can be proved by reference to the proof of Theorem 4 
applied to the function 

g*{t, & | 0) = g(t, & | 0) for | 6 - 6 | <: 8; 

= 0 | 0 - 01 > 5. 

Some of the conditions in Theorems 4 and 5 are stronger than necessary. For 
example g > 0 may be replaced by a weaker condition where g is positive in a 
region about i = ci(0). On the other hand the condition Pr{| $ — 0 | > 5) < 
MN ' /2 in Theorem 5 is necessary to the argument used in the proof. It is easy 
to construct trivial examples where the results of this theorem apply although 
this condition is not satisfied. However an example has also been constructed 
where all the conditions of Theorem 5 hold except for this condition and the 
method of Wald fails to give the results. 

These theorems are very easily extended to the /e-dimensional parameter case 
by replacing the conditions on the derivatives with respect to A by the same 
order mixed derivatives with respect to Ai, A 2 , • ■ • , A k of 

Q(t> ft "b Ai, §2 "h A 2 , •••,$*-(- A k | 0i -j- Ai, ' • ■ , 0 k -f- A k ). 

The symmetric case arises when the distribution of 6 is almost symmetric 
about 0. More exactly we have 



ASYMPTOTIC STUDENTIZATION 


277 


Theorem 0. If 

(i) All the conditions of Theorem 4 hold and L0, 0) has the additional property that 

jf K 0 ~ *)*£(*, 6) d6 < M(0 lt 9i)N~ l , 6i< 6 <8,, 

and 

-, <+> « i+/ 

(ii) Sva?' ^^® ^ 29 “ ^ < L 0> 9)\b - e\, 

Ci < t < Ci, Qi < 8 < 8i , i + j < s — 1, 

then it is possible to construct a sequence c*(0), c 2 (0), • • • as m Theorem 4 

so that 

| c£ p, («) | < Jf(ft , 0 2 )lV~ <m '’ 1) , 

p = 0, 1, ■ • ■ , s - 2 m + 2, 6i < 6 < e 2 ; 

| « B ) - c< p, (0) | < M(6,, e i )N~ m > 

p - 0, 1, • • • , s - 2 TO + 1, 01 < d < di ; 

| c* <p) (^) I < iW(0i, 

p = 0, 1, •■•,8-2m+2, - to < (5 < »; 
and 

I Pr{< < ef(J) -)-•••+ c?0)J - a | h)N- n , 

0i<0<0 2 ,r= 

Theorem 5 can also be extended to the symmetric case 
It is often possible in the theory of statistics to obtain an asymptotic expansion 
of the distribution of t, 6, The treatment of such cases is often very simple 
because of the prominent role played by the normal distribution in such 
asymptotic expansions. Suppose that 

g(t, 6 | 0 ) = Vm(i, * | 0 ), 

where ^ = Vi V(fi — 0); 7 = density distribution of (t, 4>)-, 

7(i, * | 0) - to (t, t\6)+ N-^-nd, * | 0) + • ■ ■ + N~ { - m y* | 0) 

+ P (t,i\9)N-" 1 , 

7o > 7i» * * ’ > 7.-i are independent of IV; 

jj\ P \dtdt< M{6 j, 0 2 ), 0i < 0 < 0 2 ; 

f f 1 7. 1 1 df dt < M(6 1 , 0 2 ), 


01 < 0 < 02 . 



278 


HERMAN CBERNOFF 


Correspondingly we have 

9 ft J |» = 9 ,( 1 ,«|« + r“U «!«) + ••■+ V ' 1 '" 1 ' 1 


where 


9 , (|, i| 9) = Vff T.(i, # |«), f((, ^ I») = ViW, f I «)• 

Then if we define &(0) by I d& digo = 

*L<jq tLoo 

c*( 0 ) by [ dll 

J-flC ‘'t 


ej (J)+" '+Cn-i(h+«n tf) 

[(()+ ■ ■ ,+Cfl-l (^) 

<!«')+' ■+««-](*> 


dtyo 


= a - [ d8 I dtlg„ + 0iiT + ■' 1 + 

J— 00 


(irt—l)|!l 


or by 

Cm{0) f dholcM Q\&) 


—M 


«i(0+":+5ib~i(^) 


<% + 0ilT 1/2 + ••• 4- f7*-ilV' 


—((m-l)/Hi 


we obtain 

|Pr{f < ci(«)+'•* + h )iT ,/J } 

if j obeys the conditions of Theorem 4 except that we need only s - i + 1 
derivatives for 0 ,<f, 1 1 8)- The above definitions of c m {6) correspond to the 
<£((>) in Theorem 4. Analogues of Theorems 5 and 6 also apply to the asymptotic 

case. 

REFERENCE 

[1] B L, Welch, "On the studentization of several variances/’ Annals of Math, Stal., 
Yol. 18 (1947), p. 118. 



SOME LOW MOMENTS OF ORDER STATISTICS 
By H, J. Godwin 
University College of Swansea, Wales 

1> Introduction, In a paper on. order statistics from several populations 
fljj there were given, among other results, the means, variances, covariances, and 
correlations of order statistics in samples of ten or less from a normal population. 
These were obtained by numerical integration, and on account of the difficulties 
arising therefrom, some results were given to only two decimal places. More 
recently, Jones [3] has shown that some of the integrals, for sample sizes not 
greater than four, can be evaluated explicitly. 

In this note these results are supplemented in two ways. For a paper which 
the author has recently submitted to Biomelnka integrals were evaluated which 
can be used to give some of the results in [1] to more places of decimals. It is 
also shown that the table of explicit values oan be extended. 

2. Approximate values. Let the population studied be normal with mean 
zero and variance unity, and let the members of a sample of n be x(l \ n) > 
x(2 | n) > • • > > x(n | n). The integrals available are 

f CQ 

f'(*)(l -Fix))* <k(l < i <, 5), and 
F { (x) j (1 - F(y)) dx dy{ 1 + j < 10), 

where 

These were evaluated to ten places of decimals, the last place possibly being in 
error by one or two units. 

For the purpose in hand we define also 

a(i, j) «= J"hF < (x)( 1 - F(x)Ydx = 

and 

Kh j) = f x a f(x)F*(x)(l - F(x)y dx = p(j, i). 

J—b 0 

Now, on integrating by parts, we have 

f (1 - F(y)) dy = -a:(l - F{%)) + [ yf(y) dy, 

*>X 

279 



280 


H. J. GODWIN 


and for /(x) as r’-^ned above (so that in what follows we restrict ourselves to the 
normal distribution only), the second integral is /(a). Hence \p{i, 1) + a{i, 1) = 
1/ (i + 1) and we can construct a table of a’s by using also the relation 

- a(i -f 1 ,j) = a(i,j + 1). 

Again, on integrating by parts, we have 

P(i, i) = f \ixf(x){l - Fix))" - 2a;(1 - F(x)Y] dx(i > 0) 

J—oo % “J" 1 

= ^-T-r {i/5(* —- 1 ,1 - 1) — /3 (i, i )) - 1-j—r ol{% + 1, i), 

I + 1 If 1 

using the fact that, in this particular case, 2F — 1 is an odd function and 
F(l — F) an even function of x. 

Hence /3(f, i) = ^ ~ lj i ~ ^ - + lj *)» an(i using 

P(i, j) ~ + 1, j) = Kh j + 1) we can find the j3’s. 

Finally we put y(i, j ) = 1,J '^ t _ a ^’ A w hi c h can be shown by 

V 

an integration to be equal in this case to y(j, i). 

Now 

(1) E(x(i | n) - x(i + 1 | n)) = n C< F n ~‘(#)(1 - F(z)y dx, 

•»— » 

as was proved by Irwin [2]. By the symmetry here this integral is the same 
if b n - i are interchanged, and since F a ( 1 - F) b + F b ( 1 - F) a is a polynomial 
in F( 1 - F) (as may be seen by putting F = J + (?) the integrals (1) can be 
expressed in terms of the ^(i). Usmg the fact that the expected value of the 
median is zero the E(x(i \ n)) follow. 

The frequency function of x = x(i \ n) is 

( , ~ / wo - mY-'r-'b), 

and so 

(2) E(x(i | n)f = i n C< - 1, n - i). 

The joint frequency function of a;< = x(i j n) and x # = x(j | ») is 
n i 

1)!( n-i)! /fa ) /fx ’ )(1 ~ nx<))'- l (F( Xi ) - F^y-'-'r-^x,) 
(taking j > i), and to find E(x x x,) we multiply by x, x, and integrate, x, going 



LOW MOMENTS 


281 


from -» to ®, and x, from x, to co On expanding (1 
by the multinomial theorem a typical term is 

( 3) f x, s,/(*,)/(*,) (1 - F(x t )y- 1+r F n ~ 1+, (x ] ) dx, dr,. 

TABLE 1 


Means and standard deviations 


Statistic 

Mean 

Standard 

Deviation 

Statistic 

Mean 

Standard 

Deviation 

*(1|2) 

.5641896 

.8256453 

*(1|8) 

1.4236003 

.6106530 

*(1|3) 

.8462844 

.7479754 

*(2|8) 

8522249 

4892862 

m(2|3) 

0 

.6698292 

*(3|8) 

.4728225 

.4480723 

i(l|4) 

1.0293754 

7012241 

*(4|8) 

.1525144 

.4326503 

*(2|4) 

.2970114 

.6003793 

*(1|9) 

1.4850132 

.5977903 

*(1|8) 

1.1629645 

.6689799 

*(2|9) 

.9322975 

.4750755 

*(2|5) 

.4950190 

.5581388 

*(3|9) 

.5719708 

.4317205 

*(3 5) 

0 

.5355685 

*(4|9) 

.2745259 

.4129877 

*(16) 

1.2672064 

.6449241 

*(5|9) 

0 

.4075553 

*(2|6) 

.6417550 

.5287511 

*(1|10) 

1.5387527 

.5868083 

*(3|6) 

.2015468 

.4961981 

*(210) 

1.0013571 

.4631674 

*(1|7) 

1.3521784 

.6260334 

*(3 10) 

.6560591 

.4183339 

*(2|7) 

.7573743 

.5066882 

*(410) 

.3757647 

.3974153 

®(8|7) 

*(4|7) 

.3527070 

0 

.4687447 

.4587449 

*(5|10) 

.1226678 

.3886565 


We integrate by parts with respect to x, and then with respect to x, : the integral 
(3) is then seen to be y(i + r, n — j s + 1), and 


(4) 


n \ j—1 j—l-r 

= - 1) |(j - i - 1) !(n — j)! 5 § 


(-1) G‘ - i - 1)1 

rlsl(y — i — 1 — r — s) 


y(i + r,n — j + s + 1). 


Using (1), (2) and (4), the values in Tables 1, 2, and 3 are obtained. The 
values are estimated to be correct, except for sample sizes 9 and 10, for which 
there may be errors of one or two units m the last place given. Missing values 
are filled in by considerations of symmetry. 


3. Exact values. All the integrals occurring for \p(i) or \p(i, j) can, By suitable 
transformations, and the integration of one variable over the range — °° to 

















282 


II. J. GODWIN 


TABLE 2 

Variances and covariances 




























































LOW MOMENTS 


283 














































284 


H. J. GODWIN 


Now if Q is ax\ the integral is Wir/a (this is, in effect, stated by Jones). 
By elementary integration we have also that if Q = ax 2 + 2 hxy + by 2 , the 
integral is 

Vab=t* j? - arc tan 


TABLE 4 


Exact expected values 


m(l|4): 

Vit [(2/5)a 


+ (2/5)c] 



*(2|4): 

V* [(2/5)a 


- (6/5)c] 



*(1|5): 

[(1/3)0 


+c] 



m (2| 5): 

Vtt f(2/3)a 


-2c] 



*(315): 


0 




m(l|5) 2 : 

1 

+6 


+d 


z(2|5) 2 : 

1 



—4d 


z(3 5) 2 : 

1 

-2b 


+6 d 


*(1|5)*(2|5): 


b 


+d 


*(1 |5)a(3 5) 

2a 

-26 


— 2d 


*(1|5)*(4|5) 

— 2a 




+3/ 

*(l|5)a:(5|5) 





“2/ 

*(2|5)i(8|5) 

— 2a 

+36 


-d 

+/ 

*(2|5)*(4|5) 

4a 

-46 


+4 d 

-4/ 

*(1|6)«: 

1 

+6 


+3d 


x(2|6) 2 : 

1 

+6 


-2d 


r(3|6) 2 : 

1 

-26 


+Gd 


*(1|6)*(2|6) 


6 


+3d 


as(l16)*(3 6) 

3a 

-26 

+3c 

— 6 d 

-3/ 

.t:(1 |6).t(4|6) 

-3a 


-9c 


+9/ 

.r(l|6)a:(5|6) 



12c 


-6/ 

.r(l|6)*(6|6) 



-6c 



*(2|6)*(8|6) 

— 3a 

+46 

-3c 


+3/ 

*(2|6)*(4|6) 

9a 

-66 

+9c 

+6d 

-16/ 

*(2|6)*(5|6) 

— 6a 


-18c 


+ 18/ 

a'(3|6).r(4|6) 

— 6a 

+66 


— Gd 

+6/ 


and if Q is ax + by 2 + cz + 2fyz + 2gzx + 2hxy, the integral is 

Wl{l + arc tan gJ} v^ +arc tan + arc tan /j v^r}- 

Where A = abc + 2fgh — af — lg 2 — cl 2 . 

The author has not succeeded in obtaining similar results with a higher 
number of variables—it is possible that elementary functions no longer suffice 
then 






LOW MOMENTS 


285 


Using these results we can obtain exact expressions for ^(1), f (2) and \p(i, j ) 
for 1 < i, j)i+j< 6, which give, in addition to Jones’ results, the exact expected 
values in Table 4, wherein 

a = 15/47r 

b *= 5V3/47T 

c = (15/2 tt ! ) arc sin (1/3) 

d — (5-\Z3/2ir 1 ) arc sin £ 

/ = (15/tt 2 ) arc sin (1/V6) 

REFERENCES 

11] C PIabtings, Jr., F Mobtelder, J. W, Tukey and C. P. Winsor, "Low moments for 
small samples; a comparative study of order statistics,” Annals of Math. Slat., 
Vol, 18 (1917). 

[2] J. 0. Irwin, “The further thoory of Francis Galton’s individual-difference problem,” 

Biometrika, Vol. 17 (1925). 

[3] H. L. Jones, “Exact lower moments of order statistics in small samples from a normal 

distribution,” Annals of Math Slat., Vol. 19 (1948). 


= 1.19366 20732, 
» .68916 11193, 
= ,25824 50843, 
= .11085 93167, 
= .63913 55493. 



ON A THEOREM OF HSU AND ROBBINS 

By P. Erd5s 
Syracuse University 

Let }i(x), /*(x), ■ ■ • be an infinite sequence of measurable functions defined 
on a measure space X with measuie m, m(X) = 1, all having the same distribu¬ 
tion function 6(t) = m(.t; f k (x) < t). In a recent paper IIsu and Robbins 1 
prove the following theorem: Assume that 


(x> 


. [ t dG(t) = 0, 

J—uj 

[“ i 1 dG(t) 

J~oo 


< CO, 


A”! 


EA(z) | > nj, and put M„ — m(S„). Then X -LL 


> c-n 


(1) 

( 2 ) 

Denote ly S n the set 
converges. 

n n 

It is clear that the same holds if X fk(x) > n is replaced by X fk{x) 

k -1 *-i 

(replace f k (x) by c-fi.(x)) 

It was conjectured that the conditions (1) and (2) are necessary for the 

oo 

convergence of X Af n Dr. Chung pointed it out to me that in this form the 

n“l 

conjecture is inaccurate; to see this it suffices to put f k (x) — i(l + r*(x)) where 
n(x) is the fcth Rademacher function. Clearly | f k (x) | < 1; thus M n = 0, 

« /% oo 

thus X M n converges, but / t dG{l) ^ 0. On the other hand we shall show 

7»™1 00 

in the present note that the conjecture of Hsu and Robbins is essentially correct. 
In fact we prove 

^Theorem I. The necessary and sufficient condition for the convergence of 
XI M n is that 


(!') 


f t dG(t) I < 1, 

J— oo | 


and (2) should hold. 

In proving the sufficiency of Theorem I, we can assume without loss of gener¬ 
ality that (1) holds. It suffices to replace/*(*) by (f k (x) - C) where C = [ tdG(t). 

J— 00 

The following proof of the sufficiency of Theorem I (in other words essentially for 
the theoiem of Hsu and Robbins) is simpler and quite different from theirs. 
Put 

( 3 ) a, = m{x\ | f k {x) | > 2‘), 

1 Proc, Nat Acad. Sciences, 1947, pp. 26-31. 

286 



ON A THEOREM OP HSU AND ROBBINS 


287 


since the /t’s all have the same distribution, a< clearly does not depend on k. 
We evidently have 

1 2 2 *' 1 a. < 222 - a <+1 ) < f < 2 dff(i) < £ 2 2l+2 (a i - a, +1 ) < £ 2 2>+2 a,. 

|”0 1*0 J— oo i»0 t*0 

Thus (2) is equivalent to 


(4) 


22 2 J * a, < co. 

i—0 


Let 2*' < n < 2 ,+1 . Put 
Si" = (*j 1 * 1 " '' <_2 

S? = (*; 


£ ( „ 3) = («; 


/*(a;) | > 2‘ 2 , for at least one k < n), 

fki(x) | > n lb , | /jfc a (m) | > n 16 , for at least two fa < n, fa < n ), 

n I 


£/*(*) 


k-l 


> 2 - 2 ), 


where the dash indicates that the k with | f k (x) | > n ls are omitted. We 
evidently have 

S n C 8 i" U S< 2) U S< 3) . 

For if a: is not in Sli l) U S ( „ 2> U then clearly 


22 fk(x) 




< 2‘~ 2 + 2‘ -2 < n. 


oo 

Thus to prove the convergence of 22 M n it will suffice to show that 


n—1 


(5) 22 (m(Si?) + m{8n ) ) + m(S i n ) )) < 

n-1 

From (3) we obtain that m(Sn ) ) < n-af_ 2 < 2‘ +1> <J(_ 2 . Thus from (4) 

(6) 22 »(si") = £ 22 «<&") < £ 2 2(+3 a, < co. 

1,-1 1-0 1-0 

From (4) we evidently have that for large u 

m(x) | fk(x) | > u) < l/u. 

Thus since the f’s are independent and have the same distribution function it 
follows that for sufficiently large n, 

miS 1 -?) < 22 Mx; | /*,(») | > n </8 , j /*,(*) | > n ili ) 

\£ki<ki£n 

< m(x; |/i(*)| > n l6 ),m(xi | / 2 (m) | > n !i ) < n t -ri~ 1 * 16 = n~ 6l \ 

Hence 

(7) £ m(S ( „ 2) ) <oo. 



288 


F. EBDOS 


fk(x) for | f k (x) | < n u t\ 

0 otherwise. 

Clearly the f£(x) are independent and have the same distribution function 
G+(t). Put 

(8) f t dG*(l) = e, Qk(x) = }t(x) - e. 



We have from (8) that f gdx) dm = 0, and by (1) that« -*■ 0 as n -► <=o . We 
evidently have 

f (itgk(x)) dm = f E gt(.x) dm -h 6 f ffl(x)-g 2 t(x) dm. 

Jx \k-1 ) Jxk-L Jxisk<l£n 


Now since max | gk(x ) | < n /s + «, 


J gi(x) dm < (n 4/t + e) ! • gl(%) dm < ci-7i sm , 


J gl(x)-gi(x) dm = J* gl(x) dm g){x) dm < <h. 


Thus 

f x (jt 9 k(xij dm < Cj n nl6 . 

Hence 

(9) m(x] 2 ?*(*) > n/16) < ci ri~ oh) . 

\ i-i 

Thus from (8), (9), \ft(x) | < | gk{x) | + 1/16 (for e < 1/16) and n/8 < 2‘ 2 
we have 

m(^r, XI fix) > 2’~ 2 ^ = m^*;| S/Z'Cx) j > 2 1-2 ^ 

< m(x) | gk(x) j > n/ 16 ^ < c i rr i ’’ l6) , 


or 

(10) m(SS, 3) ) < c 4 rf (7/6) . 

Thus finally from (6), (7) and (10) we obtain (5) and this completes the proof 
of the sufficiency of Theorem I. 



ON A THEOREM OF HSU AND ROBBINS 


289 


Next we prove the necessity of Theorem I, in other words we shall show that if 

60 

2 M n converges then (1') and (2) hold. 

n»»l 

First we prove (2). The following proof was suggested by Dr. Chung, who 
simplified my original proof. By a simple rearrangement we see that (2) is 
equivalent to 

(11) 22 n f dQ(t) < « 

n-l J|«|>an 


for any c > 0; while 

( 12 ) 


f |i| dG(t) < 

J — DO 


oo 


is equivalent to 

(13) 22 f dG(t) < oo 

n-l J|q>on 

for any c > 0. Now we have clearly, 

(x;|/„(x)| > 2n) CS n _,US n . 


Hence 


22 / dG(t) g 22 {m{Sn-i) + m (S„)) < ». 

n J | l\ >2n n 

Thus we obtain (12). Since the terms of this series is non-increasing it follows 
that 


(14) 



Our assumption being that 2 M n < ®we have M n —> 0 as n —* ». 
that there is a constant p > 0 independent of k and n such that 


m 




sS p- 


It follows 


Now, writing set intersections as products, we have 


u (x; I fk (x) I > 2n)- 
1 




< n C Sn . 


U (R k T k ) C S n , 


1 


Writing this for a moment as 



290 


P. ERDbS 


where R k = (*; | f h (x) | > 2n) etc. and denoting by R' the complement of R, 
we have 

M n ■ m(S n ) ^ ^ (R k -Tk)^ 

= m (0 (R, TO' • • ■ («»-i T*.,)' fit r») 

= tmmTd' (BnTi_,) / B*T*) 

fc-1 

S E m(B( • • • B*-i Ti) 

fc-1 

£ E {m(«i-2V) - m((B t U • • • U R k , x )R k )} 

*-1 

E {m(T*), - (fc - l)m(J?i))m(ft t )) 

/fc-1 

S E {p - wre(B t ) }m(Bjt) ^ E (p - <r(l))m(fi*). 

*-i fc-i 

£ p' E m(B») = V [ dO(t) 

*-l J | «| >2 n 

by (14) since m(Bi) = / dG(t), nm(Ri ) —► 0 as n —>■ «. 

JI <| > 2n 

Thus 


E w f dG(t ) g ^ E < °°. 

» J|«|>2n P n 

Hence we have (11), which is equivalent to (2). The proof of (1') is quite easy. 
By virtue of (2) we can put 

f 10 (f) = C. 

J—oo 

If C > 1, then it follows from (2) and Tschebycheff inequality that M„ — *• 1 as 
n — > oo, thus C g 1. But if C = 1, we conclude from (2) and the central limit 
theorem that M n does not tend to 0. Hence C < 1, and (1') is proved. 

By similar methods we can prove the following results: Let 2 < c < 4. Put 



Then the necessary and sufficient condition for the convergence of EiWn ' ) 



ON A THEOREM OF HSU AND ROBBINS 


291 


is that 


f tdG(t) = 0, f | Zf dG(t) < *. 

J—CO J—00 


If c < 2 then the necessary and sufficient condition for the convergence of 
X! M b 8> is that f 141° dG(t) < so. 

Finally we can prove the following result: Assume that / tdG(t) = 0 and 

J—oo 

r°° 

/ t* dG(t) < so. Then there exists a constant r so that 


(17) Xrf m \ x \ I £ /«.(*) I > n Ui ■ n Y < 00 ■ 

n—1 L | k—1 ] 

The case of the Rademaeher functions shows that (17) can not be improved 
very much, in fact only the value of r could be improved 



NOTES 

This section is devoted to brief research and expository articles on methology and 
other short items. 

BROWNIAN MOTION ON THE SURFACE OF THE 3-SPHERE 

By K6saku Yosida. 

Mathematical Institute, Nagoya University 

1. Introduction. Let S be a n-dimensional compact riemann space with the 
metric ds 2 = g„(x) dx'dx’ such that the totality G of the isometric transformations 
of S onto S constitutes a Lie group transitive on S. Consider a temporally 
homogeneous Markoff process by which P(t, x, y), t > 0, is the transition prob¬ 
ability that a point x is transferred to y after the elapse of (-unit time. We 
assume that P(t, x, y) is a Baire function m (i, x, y) and continuous in t, then P 
satisfies Smoluchouski’s equation 

(1.1) P(t + s, x, y) = f P(t, x, z)P(s, z, y) dz (J, s > 0), 

Js 

dz being the tr-invariant measure \/g(x)dx 1 dx 1 • • • dx", g{x) = det(p<,(a:)), and 

(1.2) Pit, x, y) £ 0, 

(1.3) f P(t, x, y) dy = 1. 

The spatial homogeneity of the transition process may be defined by 
(1 4) P{t, Tx, Ty) = P(t, x, y) for T e G. 

The "continuity” of the transition process may be defined, following after A. 
Kolmogoroff and W. Feller, 1 as follows. Let Li(S) be the function space of 
integrable (with respect to dx) functions/(x) on S, then, for those/(x) which are 
dense in LfS), 

= (fS> 0); 

(1.5) 

N, x) = [ f(y)P(i, y, x) dy, (t > 0), /(0, x) = fix), 

where, with non-negative b"(x) 

(1 6) (AfXi) = ^7= A (- o'(ri/(d) 

_ + vm (VW> b '‘ mx)) • 

1 A Kolmogoroff, ''Zur Theorie derstetigen zuf&lligen Prozesse,” Math. Annalen , Vol. 
108 (1933); W Feller, “Zur Theorie der stochastischen Prozesse,” Math Annalen, Vol 113 
(1937) 


292 



BROWNIAN MOTION 


293 


The temporally and spatially homogeneous “continuous” Markoff process 
may, if it exists, be called a Brownian motion on the homogeneous space S. 
The purpose of the present note is to show that, under some derivability hypoth¬ 
esis concerning a ( x ) and b J (x), there exists one and (essentially) only one Brown¬ 
ian motion on the surface of the 3-sphere S 3 . 

I here express my hearty thanks to Dr. Kiyosi It6 who proposed to me the 
problem and discussed and much improved the manuscript. 


2. The defining equation for the Brownian motion. The spatial homogeneity 
(1,4) is equivalent to the fact that A is commutative with every operator T de¬ 
fined by 

(2.1) mm = f(Tx), T e G, 

because we have 

Jj(.V)P(i, V, Tx) dy = Jj(Ty)P(t, Ty, Tx) dTy = Jj(Ty)P(t, y, x) dy. 
The condition (2,1) is equivalent to 


(2.2) XA = AX for any infinitesimal operator X = t?(x) — s 

o*c 

induced on S by the infinitesimal operator of the Lie group 0. Thus, assuming 
the derivability of a'(x ) and b x, (x) of necessary orders, we obtain from (2 2) the 
conditions: 


(2.3) 


(2 4) 


(2.5) 


(a'w - -Viwa‘(x> + jVgMgM ), 

vk ™ TF* + S tin H>( 4 

mx) = g'(x) + ± (Vim v 3 (*)), 

»"(») ^ + »*<*> d W ' { ' fo) Tjr- 


Now for the surface of the 3-sphere S 3 , 
ds 2 = d0 2 4- sinVd? 2 , 
and the infinitesimal operators 


g(6, <p) = sin 2 ?, 


X * ~ sin v de + 


X v = cos <p — — 


cos 9 cos <p d 
sin 6 dip 5 

cos 9 sin <p d 
sm 9 dip ’ 



294 


kOsaku yosida 



respectively correspond to the rotations about the x-, y- and z-axis. 

From (2.5) we see that, by taking X = X ,, 

(2.6) b' J (9, <p) is independent of <p. 

By taking X = X c in (2.4) we see that H k is independent of <p. Hence, by 

(2.7) o'(0, ip ) is independent of <p. 

Thus, by taking k = 1, X = X z we obtain from (2.4), 

shTs^ C0S * ~ sin v = sin ^ | II 1 (8)) 

and thus 

(2.8) lf{0) = 0, b n {6) + - (~J- a H\0)) = 0. 

ad \sin 0 / 

Hence, by taking k = 2, X = X x or X = X v , we obtain from (2.4) 

~H\d) cos tp , 2&ll . . cos 0 ooa <p 2b li (6) ain ^ _ h n (fi) cos 0 cos <p 
sin’ 6 sin* 0 'am 0 ' ' sin 0 

gl ( fl ) ain f _ 9 ;h n Cfli cos 0 sin <p 9Ji m^ cos y cos 0 sin <p 

am’ 6 sin* 9 K J sin* 0 1 ^ sin 0 

From these two equations we obtain 

(2.9) b'\e) = o, - 2b”(e) ~ + b M (d) — = o. 

sin* 9 sux*0 sin 0 

By takmg t = 2, k = 1, X = X „, we obtain from (2.5), (2.9) 

b 22 (0) cos + b u (0) ^ ( C ^ a 8 009 = o 

a# \ Bin 0 / 

and hence 

(2.10) b w (0) = . 

v ' sin* 6 

Similarly by taking t = 1, k =a 1, X = X, we obtain from (2.5) 

b 12 (0) cos <9 + i> l2 (0) cos <p = sin <p ^ 

ad 

and hence by (2.9), (2.10) 


(2.11) b u (0) = constant C, 

Thus we obtain from (2.4) 


6 M (0) 


C 

sin 2 6 ' 



BBOWNIAN MOTION 


293 


H\Q) = ~a\e) sin 0 + 2 C cos 0, H\o) = - sin Q-a(6) 
and thus, by (2,8), 

( 2 . 12 ) a \d) = 0 . 

Substituting (2.11) in (2.9) we obtain 


(2.13) a'(6) = . 

sin 0 

Therefore since 1> U (0) and b a (6) are non-negative, A is (essentially) equal to 
the Laplace operator 


(2.14) 


i 6_ ain a l a 2 
sin 6 36 am ° 36 ^ sin 2 6 dip *' 


Thus we may obtain P(t, x, y) by integrating the equation 


(2.15) A-/(t;g,p), (ifcO), 

and by putting 

(2-16) /(*} 0, <p) = /(i, x) => f f(y)P(t, y, x) dy. 

JS* 


3. Integration of the equation (2.16)-(a.l6). Consider the Laplacian (real) 
spherical harmonics 

(3.1) Y?\Q, <p) = Yf\x), (~fc g m g fc; fc = 0, 1, • • •)• 

They constitute an orthonormal function system complete for continuous 
functions on S', and we have 

(3.2) A■ Y!?' (0, p) - -*;<* + 1)Fr(0, p). 

Since, as is well-known, 

(3.3) Ytr\T~ l x ) = £ u™(T)Yi n \x) 

n—A 

by an irreducible orthogonal representation ('aim(T)) of the rotation group G, 
we have 

(3.4) max | Y ( k m) (x) | ! g (2k + 1) min £ | Yi n \x) | s , 

» ® n— A 

by applying the Schwarz inequality and the transitivity of the group G on 
S\ The right hand member satisfies, by the orthonormality 

(3.5) (27c + l) 2 /(area of *S 3 ). 

Therefore the double series (for t > 0) 



296 


ARYEH DVORETZKY 


(3.6) p(t,e, v] e', P ') = ZE exp (-Uk + 1 )t)Yl n) (e, <?') 

k a Q m»-fc 

is absolutely and uniformly convergent on S 3 . We will show that this P is 
the required (unique) Brownian motion on S 3 . 

The proof may be given in three steps, i) We sec by (3.2) and (3.6), that 

f Vi x ) dx satisfies (2.15) if 

Ja 3 

f(x) ~Z t di n) Yl m] (x), E Z exp {-kQe + 1 )t)k(k + 1) dl m) Yl m) (x) 

m m —k ft™0 W “—k 

are both absolutely and uniformly convergent. By the completeness of {F* m) (x )}, 
such f(x) are dense in Li(S). 

li) Because of (3.3) we see that (3.6) satisfies the spacial homogeneity (1.4). 
iii) (1.3) is obvious by the orthonormality of {F* ml (x)} and the constancy 
on S 3 of Fo 01 (x). Next, for the solution /(£, x) of (2.15)-(2.16), let f(x) = 
/( 0 , x) be non-negative on S 3 , then g,(t, x) = exp(— e £)/(£, x), (e > 0 ), satisfies 

^ = A•{/,(£, x) - tg,(t, x), (t > 0), 

0 ,( 0 , x ) = f(x) ^ 0 (on S 3 ). 

Thus g,(t, x) j£ 0 on S 3 , since 0 <(t, x) cannot have a negative minimum on the 
product space [h , < 2 ] X S 3 , for any <* > h > 0. For at such minimizing point 
we must have 







^ 0 . 


Therefore, since e > 0, U > k > 0 were arbitrary, we conclude that /((, x) ^ 
0 on S 3 for t > 0 if f(x) — 0 on S\ This proves (1.2). The same argument 
simultaneously shows us that the solution P of (2.15)-(2 16) and (1.2)-(1.3) is 
unique. 


ON THE STRONG STABILITY OF A SEQUENCE OF EVENTS 

By Aryeii Dvoretzicy 

Hebrew University, Jerusalem, and Institute for Advanced Study 

1. Summary. M. Lofeve [3] has found conditions under which a sequence of 
events which may be interdependent in an arbitrary manner is strongly stable. 
In this note it is established that considerably weaker conditions imply the 
strong stability. 


A-i i , ■ ■ ■ , A. n , 


2. Introduction. Let 

( 1 ) 



STRONG STABILITY OF A SEQUENCE 


297 


be a sequence of events, which may depend on each other in any way whatsoever, 
defined on the same set of trials. 

Let R n be the repetition function of (1), i.e. R n is the number of those among the 
first n events: Ai , Ai , • • * , A n which were realized, and put f n = R n /n. The 
random variable /„ is called the frequency function of (1). 

Denoting by E{x\ = & the expected value of x it is evident that 


R n = £{£„} = £ Pr (A % ), /„ = E{f n } = I 


Tallowing Lohve [3, p. 252] we say that (1) is strongly stable if the sequence 
tpn = fn — /tl (n = 1,2, • • ■) is strongly stable in the usual Kolmogoroff sense 
[1, p. 58], i.e. if 

( 2 ) 


lim Pr (sup j <p, | > e) = 0 

n~»oo v > n 


for every « > 0. 
Putting 1 


fin = l £ Pr {Ad, 7 n = 


'fl <** 4 . 7l(n 1 ) 1 

and introducing the abbreviation 2 

Tn fin ) 


53 Pr (A^A r ) 


Lobve's result [3, pp. 257-9] is the following: 

If nS n is bounded then (1) is strongly stable. 

This, even when specialized to sequences of independent events, includes the 
Bernoulli and Poisson cases. 

Here the following stronger result will be established. 

Theorem. If 2 & n /n is convergent then (1) is strongly stable. 

In particular, if for some e > 0 the sequence n'S n is bounded then (1) is strongly 
stable. 


3. A lemma. The new tool here used is the following simple result on series of 
positive terms. 

Lemma. Let a n > Ofor n = 1, 2, • • • and 

< 3) 5 « 

be convergent. Then there exists a sequence ni of integers satisfying 

(4) 0 < n i+ 1 - n ( = o(rw) (i -* »), 

and such that the series 2“_i a nj is convergent. 


1 A^A, denotes the event: both A? and A, . 

a Our 0 n , 7 „ and 5„ correspond to Lotive’s pi(n), fi{n) and dj respectively. 



298 


ARYEII DVORETZKY 


Proof. Since (3) is convergent it is well known 3 that there exists a sequence 
of numbers L(n = 1, 2, • • •) satisfying 


(5) 

having the property that 


Z„ + 1 > Z„ , Iim Z„ = » 


( 6 ) 


n-1 n 


We define inductively a sequence of integers m(t) through 


(7) 


m(l) = 1, m(i + 1) = m(i) + 1 + , 

J 


the square brackets denoting the integral part. Clearly 
(8) 0 < m (i + 1) — m(i) = o{m (i)). 

Now for every i we choose ru so that 


m(i) < ru < m(i + 1) and a n( = min a ,. 


These tu satisfy the requirements of the lemma. 

Indeed, (4) holds in virtue of (8) while applying (5) and (7) we obtain 




= 2 . If - > ( rn(i 4 - 1 ) — 




m(i) 


(i) 


m{i + 1) m(i + 1) 


* n{ ■ 


Since 2 8< converges by (6) it follows from the preceding inequality and (8) that 
2 a« { < 00 as required. 

Corollary. The conclusion of the lemma remains valid if the condition a„ > 0 
is dropped provided (3) is absolutely convergent. 


4. Proof of the theorem. An easy calculation [3, p. 253] gives 

**» = E{(f„ - /„)*] = . 

n 

Since both ft „ and y„ are between zero and one we have 

-- < <r\ - S n < l . 
n n 

Therefore it follows from the assumption of the theorem that 2 (<r\/n) is con¬ 
vergent. Hence by the lemma there exists a sequence of integers n< satisfying 
(4) and such that 2 cr 2 ni converges. 


3 Take e.g, l„ = (s,>„ (of [2, p. 299]). 



STRONG STABILITY OP A SEQUENCE 


299 


Applying Tchebytcheff's inequality to ^ - / n> an d adding tor v > i 

we have for every e > 0 

( 9 ) Pr (sup |y„, | > 

>*i e J ,_, 

If ri( < n < n< + i then 

Rn _ Rnt ^ Tlj+\ — TLj 

n n, Ui 

Denoting the last term of this inequality by e,- and putting e,- = max„ fc) e ,, we 
have from (9) 

Pr (sup | <p n | > t + 2i() < i X <r»,. 

nkn< 6 f mi 

As h —> 0 and the right hand term is the remainder of a convergent series, (2) 
follows and the theorem is proved. 

B. Remarks. 1. The lemma used here can also be applied to the study of 
the order of magnitude of <p n in the almost certain sense. 

2. If the terms of (3) are decreasing then the existence of a convergent sub¬ 
series of 2 a„ satisfying (4) implies 2“_i a^i < «=. But this is equivalent to the 
convergence of the series with monotone terms (3) (cf. e.g. [2, p. 130]). Hence 
in this case the convergence of (3) is necessary as well as sufficient for the validity 
of the lemma. It may be possible to use this remark in order to establish in 
some special cases, where the interdependence of the variables decreases steadily 
in a suitable sense, necessary and sufficient conditions for strong stability. 

3. The sequence of 5 n is of course, of very specialized structure. Thus, since 
the stability of (1) is equivalent [3, p. 255] to —► 0 and is implied by strong 
stability, it follows that S„ —> 0 whenever 2 (5„/n) is convergent. 

Added in proof: Since this paper was submitted I heard from Professor M. 
Lohve that he has independently obtained the theorem of section 2 by another 
method. 

REFERENCES 

[1] A. Kolmogoroff, Orundbegnffe der Wahrscheinlichkeitrechnung, Ergeb. d. Math Vol. 2, 

no. 3, Springer, Berlin, 1033, 

[2] K. Knopp, Theory and Applications of Infinite Series, Blaokie, London and Glasgow, 

1928, 

[3] M. Loivs, "Etude asymptotique des sommeB de variables aldatoires lides," Jour, de 

Math , pures et appl, Vol, 24 (1946), pp. 249-318. 


I fn ~ fm | — 



300 


K. S. BANERJHH 


A NOTE ON WEIGHING DESIGN 

By K, S. Banerjee 
Pusa, Bihar, India 

1. Efficiency of weighing designs given by a three-fourth replicate. In the 

June issue of the Annals, Kempthome [1] approached the construction of the 
orthogonal matrix X through fractional replicates, the original treatment of 
which was given by Finney [2], Reference has been made to the use of a three 
fourth replicate for weighing designs. Details for such designs have not been 
furnished as their efficiency is lower than for the designs given by the com¬ 
pletely orthogonal matrix X In a three fourth replicate the treatment combina¬ 
tions have to be chosen in a particular manner for a comparatively easier 
analytical treatment both from the pomt of view of agrobiological experiments 
as well as weighing designs. The variance of each of the estimates in such a case 
will be As a matter of fact, in a weighing design given by a fractional 

replicate of the type of (2" - l)/2*, (/3 = 1,2, • • • n), of 2" experiments, the 
estimate of the variance of each object is independent of the fraction used and 
is equal to cr 2 / 2 n_1 , the same as above. 

2. Construction of a three fourth replicate. Kempthorne mentions that a 
factorial design of fraction J could be taken to consist of a ^ replicate on the 
identity I — ABC and a quarter replicate based on the identity 

I = A = BC = ABC. 

If the half replicate based on the identity I = ABC be taken to consist of all the 
treatments corresponding to the minus signs of the treatment contrast ABC [3], 
the additional quarter replicate can be chosen in two different ways. When 
however the treatments corresponding to the minus Bigns of both A and BC 
are kept, omitting the treatments corresponding to the plus signs of A and BC, 
the three fourth replicate so obtained will have certain advantages, which will 
not be available if the quarter replicate to be added is chosen to consist of the 
treatments corresponding to the plus signs of A and BC. 

3. Behavior of the contrasts in a three fourth replicate and the efficiency of 
the weighing designs. In general, if there are n treatments giving rise to 2" 
treatment combinations and if the defining contrasts be chosen as 

I = ACD = BDE = ABCE, 

it will be necessary to omit the treatment combinations corresponding to the 
plus signs of both ACD and BDE, which will be 2 n-2 in number. In the three 
fourth replicate so obtained, 2" treatment effects (inclusive of the mean) will 
divide themselves into sets of 4 treatment contrasts each. One of the sets will 
be I, ACD, BDE and ABCE and any other set will be formed by multiplying 
any treatment contrast by the defining set namely, I, ACD, BDE and ABCE. 
Only three contrasts out of four in a set will be independent, so that only one of 



WEIGHING DESIGN 


301 


the contrasts, preferably the one of the highest order interaction may be kept 
as an alias (in agrobiological experiments) of the remaining three and may 
therefore be omitted. Each of the four contrasts within a set will be orthogonal 
to each of the other contrasts in the remaining sets, but within a set the four 
contrasts will be non-orthogonal to one another. Though non-orthogonal, the 
normal equations will be of the systematic type 1 and the matrix X'X, taking 
any three contrasts out of each set of four, will take the following form: 


™x 

a 

a 

0 

0 

0 

0 

0 

o . . .“ 

a 

X 

a 

0 

0 

0 

0 

0 

o . . . 

a 

a 

X 

0 

0 

0 

0 

0 

o . . . 

0 

0 

0 

X 

a 

a 

0 

0 

o . . . 

0 

0 

0 

a 

X 

a 

0 

0 

o . . . 

0 

0 

0 

a 

a 

X 

0 

0 

o . . . 

0 

0 

0 

0 

0 

0 

X 

a 

a ■ 

0 

0 

0 

0 

0 

0 

a 

X 

a • 

0 

0 

0 

0 

0 

0 

a 

a 

x • 


• 

• 

• 

• 

• 

■ 

■ 

■ ’ • •_ 


where the order of the matrix N = f2 n is of the form 3 t(t — 2" -z ) and * = 3.2" 2 , 
a = — £2" + $2 n “ 2 = —2"~ 2 . The value of the above determinant = (x — a) 2 ' 
(x + 2a) ‘ and that of the determinant suppressing the first row and the first 
column = (x — a.) 21-1 (x + a)(z + 2a)‘“ l . a u = (x + cl)/(x — a)(x + 2a) = 
1/2" -1 , substituting for x and a. The variance of each estimate will therefore 
be a/2 n ~\ 

4. General case. When a fraction of the type a/‘/ = (‘/ — 1)// is used, 
the treatment combinations corresponding to the plus signs of the 0 independent 
contrasts is omitted. Out of each set of 2 s treatment contrasts, only a = 2 f) — 1 
will de independent and the matrix will then take a form like that of (1), where 

x = [(2* - l)2"]/2 fl = 2 n ~ p (2 p - 1) and 
a = 2 n + [(2^ _1 - 1)2"]/2 P = -2 n_@ , 

a" = [!+(,- 2)a]/(x - a) [x + (a - l)a] = (2 • 2 n ~*)/2 n 2 n ~ l> = 1/2"- 1 . 

The variance of each estimate = ir a /2 n-1 , the same as before. When a com¬ 
pletely orthogonalised matrix of the order (a2 n )/2^ = 2 n ^(2? — 1) is available, 
the variance of an estimate will be <r 2 /2" 3 (2^ — 1). The ratio of the two 
variances = 2 n ~ 1 /(2 n — 2”“ (S ) = 2 fl-1 /(2 fl — 1), which shows how the efficiency 
of the weighing design decreases with the increasing value of the fraction. 
When 0=1, i.e. in a half replicate, the efficiency is 100 percent. The value of 
the fraction is never less than 

1 The analysis of the data available from agrobiological experiments will not be cumber¬ 
some to a prohibitive extent aB in many other experiments where non-orthogonality creeps 
in The results of investigation in this direction have already been communicated for 
publication elsewhere. 







302 


K. S. BANERJEE 


6. Independence of the estimates given by L N in a biased spring balance. 
Kempthome mentions that although the optimum designs for the spring balance 
case suggested by Mood furnish somewhat smaller variance than what is given 
by fractional replicates, these designs have the disadvantage that the estimates 
are correlated, whereas the estimates furnished by fractional replicates are 
orthogonal. The designs furnished by fractional replicates take account of the 
bias and if the weighing operation corresponding to the bias is omitted (in case 
where the spring balance is free from bias), the resultant scheme will fail to give 
independent est ima tes and the variance factors will be of the same magnitude 
as in the optimum design Ln of Mood with the same number of weighings. 
Again, these optimum designs may also be made to furnish independent estimates 
when the designs are adjusted in the manner as suggested by Mood to suit a 
biased spring balance. 

It is true that the design matrix L s given by 


X - 


'1 1 0' 
1 0 1 
0 1 1 


does not give independent estimates as such; but when it is assumed that the 
spring balance has a bias and the design matrix is modified as follows: 


( 2 ) 


X - 


10 0 0 
1110 
110 1 ’ 
10 11 


the estimates except that for the bias will be orthogonal to one another and the 
variance of the estimated weights will necessarily be larger in value. 

Before proving the general case, we notice that when — 1 is substituted for 0 
in (2) above, the resultant scheme will be an orthogonalised matrix. ThiB is 
true not only in this particular instance but will hold good also in general. The 
constitution will be clear when the method of construction of Ln from Hn+i 
is recalled. 

The distribution of ones in Ln gives a special type of symmetrical balanced 
incomplete block design, where r = k = $(& + 1) and X - i(b + 1), while the 
distribution of zeros gives the complementary design for which r 0 =» r — 1, 
ko = h — 1 and Xo => X — 1. Therefore when a row of zeros and a column of 
ones (in that order) is added to Ln , the matrix X'X of the resultant scheme takes 
the following form: 


( 3 ) 


N +1 r r r ■ f 
r r X X ... x 

r X r X X 


L f x x x * • • 


r 





WEIGHING DESIGN 


303 


Making use of the identities well known in the theory of balanced incomplete 
block designs and remembering the relationships, 2X = r = fc = -|(jV-fl), 

(I) The value of the determinant of 

X'X = (r - X y-\(N + 1) {r + X(V - 1)} - r*N] = (r - X)"-^ + \(N - 1)1, 

(II) The value of the determinant suppressing the first row and the first 
column - (r - X) A ' _1 [r + \(N - 1)], 

(III) The value suppressing the second row and the second col umn 

= (r - + l){r + X(1\T - 2)\ - r\N - l) 1 

= (r - X)"- 2 [r + X(AT - 1)], 

(IV) The value suppressing the first row and the third column 

= (r - X) w-2 [r jr + \{N - 2)} - r\{N - 1)] 

= r(r - X)*-\ 

(V) The value suppressing the second row and third column 
= (r - X) w " s [X(iV + 1) - r a l 
= 0 . 


Hence, the reciprocal matrix of X'X will be given by 


(4) 


[X'X] -1 = 


■ 1 

-i/fc 

-l/fc ••• 

-l/fc" 

-1/k 

2/fc 

0 

0 

-l/fc 

0 

2/fc ••• 

0 

..... ... 

L-i/fc 

0 

0 

2/fc. 


Let Y' denote the column matrix of the resultB of the weighings, y a , yi , • • ■ , y# 
and B' the column matrix of the estimates of the weights b a , bi, ■ ■ ■ b N . Then 
the estimates will be given by the equation 

B' = [X'XyX'T. 

It is easy to see that all the rows except the first in [X'X] _1 X' are orthogonal to 
one another. To explain this, let us take the design given by (2). Here 

“ 1111 " 
y/ 0 110 
= 0 10 1 ’ 

.0 0 11 . 

Then [X'XF'X' will be of the form 

“1 0 0 0 
-i/fc +i/fc +i/fc -l/fc 
-l/fc + l/fc -1 /k +l/k • 

_-l/fc —1/k +l/k + l/k_ 

In all the rows excepting the first, for every 0 and +1 in X', there will re¬ 
spectively be a —1/k and a + 1/k in [X'X] -I X'. It has been mentioned before 








304 


K. S. BANERJEE 


that an orthogonal matrix is obtained when —1 is substituted for every 0 in X 
or X'. Hence, N rows (all except the first) of will be orthogonal 

and these N rows will estimate the N weights in orthogonal linear combinations 
Of yo , V\ • ■ ■ Vtr . 

It has been mentioned before that the distribution of zeros in L* gives the 
complementary design, for which r fl =■ r — 1, Re =» k — 1 and Xo X — 1, 
If to such a design, a row of ones and a column of ones (in that order) be added 
to suit the estimation of the weights in a biased spring balance, exactly a similar 
situation will be obtained and the estimates will be orthogonal. It can readily 
be seen that the design furnished by Yates to weigh seven light objects and a 
bias is an illustration of this kind. The scheme given by Yates is the comple¬ 
mentary design of In with an additional row and a column of ones added to In . 

The sixteen combinations of ten objects, a, b, c, d, e, f, g, h, k, l include 1, 
which corresponds to weighing with empty pans or, in other words, which is 
devoted to estimating the bias. When 1 is omitted, X'X will be of the form 


r 

X 

X 


X X 
T X 
X r 


X 

X 

X 


I 


XXX r 


where r — 8 and X = 4. The above matrix X'X is obviously of the same form 
as given by Lm . 

By following exactly the same procedure as given above, it can easily be Been 
that when the weighing operation 1 is included in the weighing design, the 
solution of the normal equations will lead to independent estimates. The 
absence of each letter will be a 0 and the presence a -(- 1 in the design matrix 
and if — 1 is substituted for every zero, the resultant matrix will be orthogonal. 
In some cases, however, the number of letters in all the combinations will not be 
the same, i.e. k will not be constant. In such a situation, k in (4) will take the 
value of r or of 2X. 


REFERENCES 

[1] 0. Kbmpthorne, "The factorial approach to the weighing problem," Annals of Math. 

Stat., Vol. 19 (1948), pp. 238-245. 

[2] D. J. Finney, "The fractional replication of factorial arrangements," Annals of 

Eugenics, Vol. 12 (1945), pp. 291-801. 

13] F. Yates, Tech. Commun. Bur. Soil Sci. Harpenden no. 35 (1937), p. 11. ’ 

[4] Harold Hotelling, "Some improvements in weighing and other experiment^ * ’'' 
niques," Annals of Math. Stat., Vol, 13 (1944), pp. 297-306 K 

[6] K, Kishen, "On the design of experiments for weighing," Annals of Math. Staten 

(1945), pp. 294-301. ' . 

[6] A, M. Mood, “On Hotelling’s weighing problem,” Annals of Math Stat., Vol. 17 1 * ' 

pp. 432-446. .■WlSN'-v 

17] R L. Plackett and J. P. Burman, "The design of optimum multifactorial expeffirent,” 
Biometrika, Vol. 33 (1946), pp. 305-325. 








CONTROL CHART 


305 


CONTROL CHART FOR LARGEST AND SMALLEST VALUES 

Br John M. Howell 
Los Angeles City College 

1. Introduction. It may at times be desirable to use a control chart for 
largest and smallest values (L & S) in place of the conventional charts for 
averages and ranges (X & R). The chart for largest and smallest values has 
certain advantages: all information may be combined on one chart, computations 
are simple, and specifications may be placed on the chart. In this paper, 
constants for the use of this chart are developed and comparison is made with 
the average and range charts. 

2. Constants for determining limits. Let L and S denote the largest and 
smallest values, respectively, in a sample of n pieces, and let L and 5 denote the 
averages of these values for k samples. Then (L + 3)/2 and (L — S)Jdi are 
unbiased estimates of the population mean and standard deviation, respectively, 
in the case of a random sample from a normal population. The value of the 
constant ck is given in [1] and repeated in table 1 for convenience. If we denote 
( L + B)/2 by M and ( L — 3) by R, control limits may be determined in terms of 
these statistics. 

In conformance with usual control chart practice, we will set the upper control 
limit at L + 3 & L and the lower control limit at 8 — , where a L is an estimate 

of the standard deviation of the largest values in samples drawn from a normal 
population, and similarly for <r B . The results of Tippett [2] and Pearson [3] 
for j E(R) of samples from a normal population were used to determine expected 
values of L and S: B(R) = d 2 a. Here, R is the range of samples of size n: 
R = L — S. But since E[{L + 5)/2] = a for a symmetrical distribution, then 
E{L) = a + d 2 <r /2 and E(S) = a — d 2 a/ 2, where a and a are the mean and 
standard deviation of the normal population from which samples are drawn. 

The probability element of the largest value [4] is given by: 

n[F{L)r l f(L) dL where f(x) = l/V^ <r e~ (I " o)!/25 ' and F(x) = f f(y) dy. 

J— oo 

Then E(L 2 ) = n [ L 2 [F(L)f -1 /(L) dL. Integrals of this type, differing only 

J—QO 

by a constant factor have been evaluated by Hojo [5] and from his results dt was 
determined so that a L = <r s = di<y. Values for d« for n = 2, 5,10 are alBo given 
by T^riett [2]. “Three-sigma” control limits may then be given in the form: 

*" : R, where = 0.5 + 3di/d 2 . The expected value of the upper control 

•</', f < 11 then be: E(UCL) = a + A^tx, where At = (d*/2) + Mi . Values of 
mstants for various sample sizes are given in Table I. 
practice, it might be desired, in the case of control charts for individual 
m v ements or for L and S, to have E(UCL) = a -(- 3<r, and the lower control 
limit symmetrically placed with respect to the central line_ In this case, the 
formula for the limits would be: M ± 3 R/da or M =fc V nA 2 R, where .4 2 = 



306 


JOHN M. HOWELL 


3/(d 2 Vn) is given in [1]. Since the efficiency of M decreases rapidly with 
increasing sample size [6], it would probably be better to use X in place of 
M for determining the central line for a control chart when the sample size is 
greater than five. X is the “average of averages” as defined in [I]. 

The chart for largest and smallest values would then consist of a chart on 
which both the largest and smallest values are plotted, with the central line at M, 
and the limits as given above. 

3. Comparison of charts for a particular case. A comparison of the L & S 
chart with the X chart for a particular case in which the sample size was three is 
given in Fig. 1. Measurements were the shear strength of spotweid coupons of 


TABLE I 

Constants for largest and smallest value chart 


n 

d. 

d t 

A, i 

A, 

A\ 

n 

2 

1.128 

.825 

1.880 

2.72 

3.03 

2 

3 

1.693 

.748 

1.023 

1.82 

3.09 

3 

4 

2.059 

.709 

.729 

1.83 

3.15 

4 

5 

2.320 

.070 

.577 

1.36 

3.17 

5 

6 

2.534 

.648 

.483 

1.27 

3.21 

6 

7 

2.704 

.627 

.419 

1.20 

3.23 

7 

8 

2.847 

.614 

.373 

1.15 

3.26 

8 

9 

2.970 

.600 

.337 

1.10 

3.28 

9 

10 

3.076 

.588 

.308 

1.07 

3.30 

10 


aluminum in pounds. Since the range chart had no points above the “three- 
sigma control limit and showed no other peculiarities, it has been omitted. 


4. General comparison of charts. We assume a mean of zero and a standard 
deviation of unity as a “given standard,” and then compute the probabilities 
when the^true values are a and <r. The probability of a point being inside of 
3-sigma control limits on the range chart under these conditions is: 
Pi = Pr (R < diDi/tr), where D« is given in [1], The probabilities for the 
range used here were found from the Pearson-Hartley tables [3], The usual 
normality assumptions are made. 

The probability of a point being inside of “3-sigma” control limits on the 
average chart under the same conditions is: 


-L 


■VT/ff ((s/v^n)—a) 


<p(t) dt where <p(t) = 




\ VW «*» YVUOID <p\l) - 7= 

J\rni<r ((-8 iv^)-a) v V 2ir 

Since Daly [7] has Bhown that the average and range of samples from a normal 







SHEAR STRENGTH OF SPOTWELD COUPON IN POUNDS 


CONTROL CHART 


307 


CHART FOR LARGEST AND SMALLEST VALUES 








308 


JOHN M. HOWELL 


TABLE II 


n 

' 

a 

cr 

P. 

Pi 

PiPi 

P, 

N l 

Ni 

3 

0 

1.0 

.994 

.997 

1 

.991 

510 

510 



1.2 

.973 

.988 


.963 

116 

122 



1.5 

,901 

.955 


.868 

31 

33 



2.0 

.721 

.866 

■ 

.645 

10 

11 

3 

0.5 

1.0 

.994 

.983 

,977 

.980 

108 

228 



1.2 

.973 

.935 

.935 

.939 

69 

74 



1.5 

.901 

.917 

.826 

.834 

25 

27 



2.0 

.721 

.830 

.598 

.694 

9 

13 

3 

1.0 

1.0 

.994 

.898 

.893 

.931 

41 

05 



1.2 

.973 

.855 

.832 

.860 

25 

31 



1.5 

.901 

.802 

.723 

.740 

15 

17 



2.0 

.721 

.746 

.538 

.550 

8 

8 

3 

2.0 

1.0 

994 

.323 

.321 

.590 

5 

9 



1.2 

.973 

.352 

.342 

.510 

5 

7 



1.5 

.901 

.378 

.341 

.414 

5 

& 



2.0 

.721 

.408 

.294 

.321 

4 

5 

6 

0 

1.0 

.995 

.997 

.992 

.992 

570 

570 



1.2 

.969 

.988 

.957 

.957 

105 

105 



1.5 

.855 

.955 

.817 

.878 

23 

36 



2.0 

.588 

.866 

.509 

.545 

7 

8 

5 

0.5 

1.0 

.995 

.970 

.965 

.980 

130 

227 



1.2 

.969 

.942 

913 

.927 

51 

62 



1.5 

.855 

.891 

.762 

.791 

17 

20 



2.0 

.588 

.805 

.473 

.505 

1 

7 

5 

1.0 

1.0 

.995 

,776 

.722 

,923 


58 



1.2 

.969 

.736 

.713 

.828 

m 

25 



1.5 

.855 

.695 

.594 

.661 

9 

12 



2.0 

.588 

.648 

.381 

.426 

5 

0 

5 

2.0 

1.0 

.995 

.071 

.071 

.512 

2 

7 



1.2 

.969 

.110 

107 

.402 

3 

6 



1.5 

.855 

.164 

.140 

.286 

3 

4 



2.0 

.588 

.230 

.135 

.185 

3 

3 









SUFFICIENCY, TRUNCATION AND SELECTION 


309 


copulation are independent, the probability that a sample is within control 
limits on both charts is the product of the probabilities: PiPi- Thus the 
probability that a sample be outside of control limits on either chart is 1 - PiP 2 . 
The probability of the largest and smallest values both lying in the interval 

r n(.t!-a)U 

from —c to c is: P 3 = Pr(— c < S, L < o) — 


<p(t) dt 


. Values of 
for sample of 


[_J a)hr 

this expression with lower limit — m are given in table XXI of [8 
sizes 3, 5, and 10. For the purpose of comparing the charts, we choose c so that 
the probabilities of Type 1 errors are equal, that is: 1 — PiPs = 1 — Pi or PiP 2 = P s 
when the mean is zero and the standard deviation unity. Substituting in this 
equation and solving, we find: F(c) — 0.5 + 0.5 (.9973P0 1/n , where F(x) = 


f (p(l) dt. For n = 3, c = 2.99 and for n = 5, c = 3.15. 

J—M 

Comparing P\Pi with Pj when the true values are a and cr will then show the 
relative power of the X & R charts and the L & S chart for detecting lack of 
■control. 

Finally the charts are compared by finding the number ( Ni for the X & R 
charts and N% for the L & S chart) of samples which will detect lack of control 
with a .99 probability under the conditions given above. This is done by 
finding the smallest integer which satisfies the following inequalities: (Pi Pi) ' < 
.01 and P> 1 < .01. As may be seen from table II, under most conditions, the 
L & S chart is nearly as good as the X & R charts for detecting lack of control. 


REFERENCES 

{1] American Standards Association, Control Chart Method of Controlling Quality during 
Production, 01.3—1942. 

12] L. H. C. Tippett, “On the extreme individuals and the range of samples taken from a 
normal population,’’ Biomelrtka , Vol. 18 (1926), pp. 364r-387 
[31 E S. Pearson, “The probability integral of the range in samples of n observations 
from a normal population," Biometrika, Vol. 32 (1942), pp. 301-308. 

[4] 8. S Wilks, Mathematical Statistics, Princeton University Press, 1943, p. 91. 

[6] Ho.ro, "Distribution of median from a normal population,’’ Biometrika, Vol. 23 (1931), 
p. 316 

[6] W. A. SnEWHAivr, Economic Control of Quality of Manufactured Product, D Van Nos¬ 
trand Co.| 1981 ( p. 282. 

17] J, F, Dalt, “On the use of the sample range in an analogue of Student’s l-test, Annals 
of Math. 8tat„ Vol. 17 (1948), pp. 71-74. . 

[8] Karl Pearson, Tables for Statisticians and Biomelrtcians, Cambridge University 

Press, 1914. 


SUFFICIENCY, TRUNCATION AND SELECTION 1 

' By John W. Tukey 

Princeton University 

1. Summary. The fact that the mean and variance were sufficient statistics 
for a univariate normal distribution truncated at a fixed point was known to 

1 Prepared in connection with work sponsored by the Office of Naval Research. 



310 


JOHN W. TUXEY 


Fisher by 1931 [2], Hotelling [3] has recently observed the corresponding fact 
for the truncated multivariate normal distribution. 

It is the aim of this note to point out that these are special cases of a general 
result, namely: If a family of distributions admits a set of sufficient statistics, then 
the family obtained by truncation to a fixed set, or by fixed selection, also admits the 
SAME set of sufficient statistics. 

2. Representation. The basic formal results about sets of sufficient statistics 
are due to Fisher [1], whose arguments, with obvious modifications, establish 
that families of distributions satisfying the usual conditions have sufficient 
statistics. The converse was established by Koopman [4] for a reasonably wide 
class of families. 

The usual condition can be easily handled and given wide application by 
representing the family of distributions in a form suggested to the author by 
Rubin, and ascribed by him to Cramdr, namely: 

dF(x | 8 ) = c(8)f(x | 8) du(x) r 

where a; is a possibly multidimensional chance quantity (i.e. random variable), 
8 is a possibly multidimensional parameter, c(8) is a positive real function of 8 
which serves to normalize the distribution, f(x | 8 )—the relative probability 
density—is a non-negative real function of x and 8, and n(x) is a positive measure 
function. In this representation the natural and sufficient condition that 
(ln(:e)) are a set of sufficient statistics for 8 is the existence of functions a,(0) 
such that (of. Koopman [4]) 

«) _ 2 MHt) . 

When 8 is a vector, the derivative is to be interpreted as the gradient (a vector) 
and the o,(fl) are to be vector-valued functions of 8. We notice that this condi¬ 
tion concerns only the relative density function. 

3. Proof of result. Suppose the family F(x | 0) is truncated onto a Borel set 
E, this means that 

Fr (i fa B\ I I () traacfad to E\ - ^ E 0 E, | | 8j| 

' Pr {x m E |F(aj | 0)} 

If 4>s(x) is the characteristic function of E, which is =1 for x in « and =0 
otherwise, and if 

k{8) = Pr{x in E | F(x | 0)} = [ dF(x | 8), 

Jx 

then the probability element of F(x j 8) truncated to E is 

c(6)/k(8)f(x j 8)4> s (x) dy(x) = c'(8)f(x \ 8 ) dv{x), 



ON A PROBABILITY DISTRIBUTION 


311 


where c'(0) = c(d)/k(6) and dv(x) = (j> B {x) dp{x). Truncation has not changed 
the relative density function, and the result follows from the form of (1). 

Next suppose that, instead of accepting values with probability one in E 
and with probability zero outside E, we select according to a fixed Borel function 
<p(x) } the chance of accepting a value x being <j>(x). The new family of distribu¬ 
tions lias the same sufficient statistics for the same reason, 

REFERENCES 

[1] R. A Fisher, “Theory of statistical estimation,” Camb Phil Soc Proc , Vol 22 

(1923-25), pp, 700-725 

(2] R. A. Fisher, “The sampling error of estimated deviates together with other illustra¬ 

tions of the properties and applications of the integrals and derivatives of the 
normal error function," Brit. Assn. Adv.Sci. Mathematical Tables, Vol. l,xxvi-xxxv. 
13] H, Hotelling, “Abstracts of Madison Meeting,” Annals of Math. Slat , Vol. 19 (1918). 
ll] B. 0. Koopman, “On distributions admitting a sufficient statistic,” Trans. Amer. 
Math. Soc., Vol. 39, pp. 399-409. 


ON A PROBABILITY DISTRIBUTION 
By Max A. Woodbury 
University of Michigan 

1. Introduction. The problem treated is that of generalizing the Bernouilli 
distribution to the case where the probability of success is not constant from trial 
to trial but depends on the number of previous successes. The case where the 
probability of an event depends on the number of trials is easily handled and 
is not the case treated here. Several special cases of such a distribution have 
been worked out at one time or another. (E.g C. C. Craig found the solution for 
one such special case and thus called the author’s attention to the problem) 

The solution involves the Newton divided difference expansion of powers in a 
form which can be utilized for computation if the number of trials is not too 
large. In the case where the probabilities on a single trial are small an approxi¬ 
mation, (similar to that of the Poisson distribution to the Bernouilli distribution) 
can be found. 

Applications can obviously be made to urn schema in which black balls are 
replaced, but white balls are removed. Similarly, applications can be made to 
the distribution of the number of plants in a given area. 

2. Solution of the problem. Specifically the problem is as follows: “What is 
the probability that in n trials of an event it will occur z times presuming that 
the probability of the event on a given trial depends only on the number of 
previous successes’” Denote by P(n, x) the probability of x successes in n 
trials and by p x the probability of the event after x previous successes. As 



312 


MAX A. WOODBURY 


conventional denote g* = 1 — p, and one can formulate the following equation 
of partial differences: 

(1) P{n + 1, a: + 1) » p z P(n, x) + q I+ \P(n, x + 1). 

This equation is an obvious consequence of the statement that a + l successes 
m n + 1 trials can only occur if there are x successes in n trialB and a success on 
the n + 1st or x + 1 successes in n trials and failure on the n + 1st. The 
boundary conditions appropriate are: 

(2) P(n , x) = 0 for x < 0, or x > n and P( 0, 0) = 1. 

It is convenient and appropriate to generalize (1) while retaining the boundary 
conditions (2). The equation (1) will be obtained from the following equation 
by setting q = 1: 

(3) P(n + 1, x + 1) = (g - q z )P(n, 3 ) + g I+1 P(n, x + 1). 

It will be noted for further reference at this point that: 

(4) P(n, 0) = gf 
and: 

(5) P(n, n) — (q — g 0 )(g - g0 ■ • • (g - g„-i). 

This last suggests a change of variable of the form: 

(6) P(n, 3 ) = F(n, x)(q - g 0 )(g - gi) • • • (g - q z ). 

Upon substituting this expression in (3) one obtains a somewhat simpler equation 
with the same boundary conditions as (2). 

(7) F{n + 1, 3 + 1) = F{n, x) + q I+i F(n, x + 1). 

Using the generating function: 

(8) G{x,H) = f>(n,3r 

one may obtain from (7), using the boundary conditions (2) the following 
ordinary linear difference equation: 

0) G(x + 1, {) = {[(7(3, {) + q z+ iG(x + 1, £)]. 

From (4) it is easily seen that: 

(10) G(0, £) = 1/[1 - g 0 £], 

and hence that the solution of (9) is: 

(ID Gix, {) = r/[(l - g 0 £)(l - ffl £) ■ ■ • (1 - 3l {)]. 

This may be expanded in partial fractions and the result written: 

Z 

(12) G(x, £) = g./[(g. ~ g«) • ■ ■ (g. — gi-i)(g.' — g,+i) ■ • ■ (g< — g*)(l ~ g»£)]> 

l 



SAMPLE SIZE DETERMINATION 


313 


By means of the relation in (8) one deduces readily that: 

X 

(13) F(n, x) = Z q*/[(q t - go) • • • (g, - g-,-i)(g< - g,+i) • • • (g. - q z )}. 

t-0 

Jordan [1, p. 19, eq. (1)] shows this to be the zth Newton divided difference of q n 
where the expansion is in terms of (g — go) ■ ■ • (g — g*), for x = 0, 1, • ■ ■ , n. 
The solution for (3) can now be written as: 

(14) P(n, x) = (g - go) • • • (g - q x -i)F n (x) 
from which follows: 


(15) 


Z P(n, x) 


2-0 


As remarked before, by setting q = 1 one obtains the solution of (1) subject to 
the boundary conditions (2). 

It is clear that when all the g, are equal that the Bemouilli distribution should 
come out as a special case. Since in this case the divided difference becomes the 
corresponding derivative divided by the appropriate factorial, one obtains: 


(16) 


P(n, x) = 


(i - g0 r ay 

xl dq x 


< 7 — Q0 


Upon reduction this yields the usual formula, but not in the usual way. 

By choosing p z — \ x /n and allowing n to increase without limit one obtains 
an analogue of the Poisson distribution, viz: 


(17) P(:r) = (-X„)- • •(-**) Ze- x 7K*o-Ai)- - • (X.-i-X,)(X t+i—x<) ■ ■ ■ (X*—X,)] 


which corresponds to the expansion ofe * about X 0 , Xi, X 2 , • • •, X*, ■ • ■ when X = 0. 


REFERENCE 

[1] Charles Jordan, Calculus of Finite Differences, Chelsea Publishing Co., New York, 
2nd ed, 1047. 


A GRAPHICAL DETERMINATION OF SAMPLE SIZE FOR WILKS’ 

TOLERANCE LIMITS 

By Z. W. Birnbaum and H. S. Zuckerman 

University of Washington 

1. Summary. To determine the smallest sample size for which the mini¬ 
mum and the maximum of a sample are the 100/3% distribution-free tolerance 
limits at the probability level e, one has to solve the equation 

(1) N0 n_1 — (N — 1)0” = 1 - « 



314 


Z. W, BIRNBAUM AND H. B. ZUCKEP.MAN 


given by S. S. Wilks [1]. A direct numerical solution of (1) by trial requires 
rather laborious tabulations. An approximate formula for the solution has 
been indicated by H. Sche£F6 and J. W. Tukey [2], however an analytic proof for 
this approximation does not seem to be available. The present note describes 
a graph which makes it possible to solve (1) with sufficient accuracy for all 
practically useful values of £ and e. 

2. Construction of the graph. Substituting in (1) 


we obtain 


1 + x = (1 — e)/3 x 
and 

(2) l°g (1 + x) = — log j-i— + Iog x ' 

To solve (2) graphically, one has to find the intersection of the curve 
( 2 ) y = log (1 + z) 

with the line 

# ,-i„ gr A. + (r4_io g ‘)*. 

To prepare a graph on which this can be done, one first plots (3) once for al 
(Figure 1, Curve C). Then one marks the points — log ~ — ~ on the y-axia 
and labels them with the values of e (Figure 1, Scale I); chooses a constant r > 0 
and marks the points r log j-^on the x-axis (Figure 1, Scale II); chooses a eon- 

0 1 

stant k > 0, marks the points kr j" _ - log - on the x-axis, draws vertical lines 

through each of these points, and labels them with the values of /9 (Figure 1, 
Scale III); draws the line x — k (Figure 1, line L) ; marks the uniform Soale IV 
on the x-axis. 

The graph reproduced here has been prepared with r = 4, k = 5. It can 
easily be verified that the instructions on the graph lead to solutions x of (2) and 



SAMPLE SIZE DETERMINATION 


315 



g 

o 

A 


-3 

o 

VI 


<n. 

nd 

QJ 

SI 

a 


a 

*3 


3) connect < on Scale I with Q; the connecting line cuts curve C at a point which has abscissa i on Scale IV; read off x, 





















316 


Z. W. BIRNBAUM AND H. S. ZTJCKERMAN 


3. Improvement by iterations. The graphical solution, usually accurate to 
two significant digits, may be improved easily by iterations. Replacing (2) 
by the equation 


(4) 


x — 


log (1 + x ) + log 



1 - 0 



one obtains iterations Xj+i = /(»,) which, for .80 < e < .999 and .80 < /3 < 999‘ 
converge rapidly to the solution of (2). 

Example. For t = .99, 0 = .999, one finds graphically Xi = 6.6, and from 

(4) the iteration formula x, +1 = ^ which yields the values Xi = 

6 642, *3 = 6.648, x t = 6.649, x 5 = 6.649. Rounding up we obtain the sample 
size N = 6.649'999 = 6643. 

For e and 0 between .80 and .999 all iterations obtained from (4) are on the 
same side of the exact solution and converge to it monotonically. Thus, in our 
example, from x\ < xi we conclude that Xi as well as all further iterations are 
smaller than the exact solution. 


References 

[1] S. S. Wilks, Mathematical Statistics, Princeton University Press, 1943, p. 94. 

[2] H. Sohbff^ and J W Tukey, “A formula for sample sizes for population toleranoe 

limits,” Annals of Math. Slat., Vol. 15 (1944), p. 217, 



ABSTRACTS OF PAPERS 

(Abstracts of papers presented at the New York meeting of the Institute on April 8-9,1948) 

1. Adjustment of an Inverse Matrix Corresponding to a Change in One Ele¬ 
ment of a Given Matrix. Jack Sherman and Winifred J. Morrison, The 
Texas Company Research Laboratories, Beacon, New York. 

If one element, Ors , in a square matrix A is changed by an amount A a^s , all the elements 
hij in the inverse matrix B are generally changed. A simple equation haB been derived by 
means of which the elements bn in the resulting inverse matrix B' can be computed directly 
in terms of Aaris and the elements of B. The equation is 

,, , bsjbiR Aors 

£>ij = o»7 — , r~ 

1 + osR&aiis 

It follows that any given square matrix can be transformed into a singular matrix by 
increasing any one element m the transposed inverse matrix. 

2. The Distribution of the Number of Exceedances. E. J. Gtjmbbl, New York 
and H. von Schelling, Naval Research Laboratory, New London, Conn. 

The probability for the mth observation in a sample of size n taken from a population 
with an unknown distribution of a continuous variate to be exceeded x times in N future 
trials is studied. The averages, moments, and the cumulative probability of the number 
of exceedances are calculated with the help of the hypergeometric aeries. The tolerance 
limits constructed by Wilks are special cases of the cumulative probability. The mean 
number of exceedances is the same aa in Bernoulli’s distribution. In some cases there arq 
two modes, namely m — 1 and m - 2. If n « JV, the most probable number of exceedances 
over the with largest value is either wi, or m — I, and the median number of exceedances is 
equal to m — 1. In 50% of all cases, the largest (smallest) of n past observations will not 
(always) be exceeded in n future observations. If n and N are both large and equal, the 
distribution of the number of exceedances over the median is normal whereas the distribu¬ 
tion of the extremes, similar to Poisson’s distribution, has a mean m, and a variance 2m. 
The vanance of the number of exceedances is largest for the median, and smallest for the 
extremes of the previous sample. These distribution-free methods may be applied to 
meteorological phenomena, such as floods, droughts, extreme temperatures (the killing 
frost), largest precipitations, etc., and permit the forecasting of the number of cases sur¬ 
passing a given severity. 

3. Note on the Power Function of a Quality Control Chart. Leo A. Aroian, 
Hunter College, New York. 

The power function of a quality control chart is given for a sequence of N sample points 
in terms of a and 7 , the probability of a Type I error and the power function respectively for 
a single sample point. Two different models are considered and the generalization to two 
quality oontrol charts is indicated. 

4. Tests Between Two Means or Regression Coefficients When Observations 
are of Unequal Precision. Uttam Chand, University of North Carolina, 
Chapel Hill. 

Relative merits of different tests available for testing two means or two regression coef¬ 
ficients in relation to asymmetric and symmetric aspects of Student’s hypothesis in case 
of unequal population variances have been reconsidered In this connection the distnbu- 

317 



318 


ABSTRACTS OP PAPERS 


tion of a certain quantity /* where k is some inexact value of the unknown ratio of varianoes 
has been obtained, The hypothesis of the equality of two linear regression funotions in 
case of unequal residual varianoes has also been considered. 

5. Functional Expansions. Eugene W, Pike, Boston, Massachusetts. 

This paper calls attention to a new typo of estimation problem, arising both in the inter¬ 
pretation of experimental data from complex experiments, and m the design of analogue 
computers for functions of several independent variables, 

It has long been known, though not widely recognized, that the partial sums of rows and 
columns arising m the bivariate analysis of variance represent the least squares fit of a 
functional form [/(re) + g{y)\ to a tabular function F(x, y) of two independent variables, for 
example More recently, several people have realized gradually that independent causes 
may combine in much more complicated ways to produce a common effect, and that corre¬ 
spondingly more complicated functional combinations, such as U(x) + g(y) + h(x)-k(y)], 
can be fitted by least squares to tabular functions of x and y. 

Examples of such expansions, as applied both to the design of computers and to the 
analysis of experimental data, will be given. 

This presentation is based on work supported by the Air Materiel Command, USAF. 

6 . The Geometric Range for Distributions of Cauchy’s Type. E. J, Gumbbl, 
New York City, and R, D. Keeney, Metropolitan Life Insurance Comany, 
New York City. 

From each of N samples of large size n the largest and the smallest values X n „ and Xi„ 
(c = 1,2, • • • N) are taken, where each X is measured from the central value of Nn observa¬ 
tions. The sample size must be so large that the probability of any extreme X„., and —Xi., 
being negative may be negleoted. The distribution of the geometric means p of the N pairs 
of extremes henceforth called geometric ranges, iB derived under the assumption that the 
initial distribution is symmetric, unlimited and of the Cauohy typo which implies that the 
moments of an order equal to, or larger than k(k > 0) diverge. Lot u bo the expected larg¬ 
est value. Then the probability density of ft <= 2 u k p~ k obtainod from a theorem of Elfving 
(Biomelnka, Vol 35) is ftKo(ft) where Ka is a Bessel function. TMb permits calculation of 
all moments of ft . Methods are given for estimating the parameters u and k The distri¬ 
bution of the geometric ranges p is again a Bessel function. A probability paper is con¬ 
structed for testing the hypothesis that the initial distribution is of Cauohy’s type, A 
strict parallelism is established between the asymptotio distributions of the range for the 
exponential type, and of the geometric range for Cauchy’s type. This provides a criterion 
to which of the two types the initial distribution belongs. 

7. On Sums of Random Integers Reduced Modulo m. A. Dvoretzky, Insti¬ 
tute for Advanced Study, Princeton and J. Wolfowitz, Columbia Univer¬ 
sity, New York City. 

Let X„ , (n = 1, 2, ■ ■ ) be an infinite sequence of independent, integral-valued, ohanae 
variables, and let m be any fixed integer greater than 1. Put <S„ = ^T^X, and denote S„ 
reduced mod. m by Y n ; i.e., Y„ is a random variable which assumes only the values j ™ 
1,2, - ■ , m with respective probabilities P„0) = Prob (<S„ = j (mod. m)). Necessary and 
sufficient conditions are obtained for Y n to be equidistributed in the limit, i.e , for lim __ 

Pn(i) — {j = 1, 2 , ■ • , m.) Some easily applicable sufficient conditions are deduced 

and the cases m — 2, 3, 4 are studied in detail. The rapidity with which P n (J ) —* — is also 
studied 



ABSTRACTS OP PAPERS 


319 


8 . The Corpuscle Problem: Estimating the Surface-Volume Ratio of a Cor¬ 
puscle of Arbitrary Shape. Jerome Cornfield, National Institutes of 
Health and Harold W. Chalkley, National Cancer Institute, Bethesda, 
Md. 


Consider a space containing F, a closed figure of arbitrary shape, volume V and surface 
area S. Let a line segment of length r be thrown in the space in suoh a fashion that we have 
uniform, distribution of the probabilities that the end point P occupies any position in the 
space and that the other end point P' occupies any position on the surface of a sphere of 
radius r with center at P. Count the number of end points falling in F (0,1 or 2 for a single 
throw), call it the number of hits, and denote it by h. Count the number of timeB the line 
intersects the surface (0,1 or 2 times for a single throw for a non-reentrant figure, possibly 
more for a re-entrant one), call it the number of cuts and denote it by c. Then, it is proved 
that rE(h)/E(c) = 4 V/S This result is intended to provide a theoretical basis for esti¬ 
mating the surface-volume ratio of physical objects of any shape. 

9. Generalized Hit Probabilities with a Gaussian Target. D. A. S. Fraser, 
Princeton University. 

In the Supplement to the Journal of the Royal Statistical Society, Vol. 8 (1946), L. B C. 
Cunningham and W. R B. Hynd proposed a problem and gave an approximate solution cov¬ 
ering a partial range of parameter values: to fin<i the probability that a moving target will 
survive a burst of “n” rounds from a rapid-firmg gun, account being taken of correlation 
between the different points of aim. 

Generalizing from the case of a two dimensional target to “k” dimensions, this paper 
gives the probability for 0,1, 2, ■ • ■ n hits, under the following assumptions the "n” points 
of aim have a Multivariate Gaussian Distribution, the dispersion error has a Gaussian Dis¬ 
tribution, and the target is a Gaussian Diffuse Target, that is, the probability of a hit on a 
particular round as a function of the coordinates of the shell has the form of “a constant 
times a Gaussian probability density function.” 

Limiting distributions are obtained as n —> » , subject to a variety of limiting conditions. 

Numerical values for the probability of at least one hit are plotted when n — 5, for a 
range of values, relative to the target Bize, of dispersion and aiming errors 

10. A New Continuous Sampling Inspection Plan Based on an Analysis of Costs. 

F. E. Satterthwaite, General Electric Company, Bridgeport, Connecticut. 

Inspection, like all other industrial operations, muBt be run to produce the most return 
for the lowest cost. The costs include overhead and running inspection costs; oomplaint 
costs; rework and scrap costs; and the costs of unnecessary process rejections Also one 
must oonsider the frequencies of occurrence of these costs. These include the process aver¬ 
age percent defective; the probability of occurrence of a complaint, and the frequency of 
occurrence of quality deteriorations. 

For continuous inspection, the percentage of the product to be inspected has a very 
simple formula ■ P = "\/SC/HM, where S is the sensitivity of the sampling plan used, C is 
the complaint cost, H is the effective inspection cost, and 1/M is the quality deterioration 
rate, 

It was also necessary to develop a new continuous sampling inspection plan which would 
be efficient over the entire range of continuous sampling applications. The plan presented 
is a sequential plan which, with suitable attention to details, is easily applied on the shop 
floor. The Dodge Plan is a special case and is efficient only in a small percentage of appli¬ 
cations. 



320 


ABSTRACTS OF PAPERS 


11. On the Levels of Significance of the F and Beta Distributions. Leo A. 
Aroian, Hunter College, New York, 

Two formulas are given for the determinations of the levels of significance of the F and 
Beta distributions In the case of the F distribution a previous set of formulas ( Biometrika, 
Vol. 34, pp 369-360) is modified to give 3 significant figure accuracy, n { , n s £ 24 The set 
for the Beta distribution is of Cornish-Fisher type, p, q ^ 0, The advantage of these over 
Paulson’s F formula and Carter’s z formula are the avoidance of the solution of a quadratic 
in the ease of Paulson’s formula, and the avoidance of the exponential tables in the case of 
Carter’s z formula A short numerical table compares the three methods for selected values 
of rii and m . 


12. Certain Statistics for Samples of 3 From a Rectangular Population, Julius 
Lieblein, National Bureau of Standards. 


A continuation of a study presented at the Madison meeting of the Institute of Mathe¬ 
matical Statistics last September (For abstract see Annals of Math. Slat., Deoomber 1948, 
p 695.) The previous paper derived properties of the statistics 


l/i 




l/i ■ 


x' + x* 

2 ’ 


where ®i, x 2 , x 3 are the observations, ordered by increasing size, in an independent random 
sample of three observations from a normal population, and x' and x", %' >■ x", are the two 
closest of the three. In the present paper distributions (joint as well as simple) are obtained 
for the above three statistics and also for the remaining observation not included in 
the closest pair, for samples of 3 from a rectangular population, and a theorem is proved 
concerning the distribution of y 3 for a wide class of continuous populations. 


13. The Choice of Lot Inspection Plans of the Basis of Cost. F. E. Satter- 
thwaite, and Burton Grad, General Electric Company, Bridgeport, Con¬ 
necticut. 

An extension of the first paper to single sampling inspection planB. The important con¬ 
cepts involved are the break-even quality level, the operating ratio, and the weighted prior 
odds that a lot is a good lot Charts are being prepared which oan be entered with simple 
functions of the costs and which give directly the sample size and acceptance number for 
the most efficient single sampling inspection plan, 

It appears promising that the method can be extended to double and sequential sampling 
plans This is imperative because of the large portion of the time that “no-mspection” is 
the most efficient single sampling plan. 



NEWS AND NOTICES 

Readers are invited to submit to the Secretary of the Institute news items of interest 

Personal Items 

Enrique Loizelier Blanco, Professor of Statistics in the University of Madrid, 
has just finished the first year of experimentation in Quality Control Methods 
in different plants The interest for these new statistical applications started 
in Spain during 1946 and have increased rapidly since then, especially this year 
after consecutive bimonthly intensive courses which Professor Blanco has been 
teaching. 

Mr. Osmer Carpenter, formerly an Instructor in the Department of Statistics 
and Mathematics at Iowa State College is now doing statistical work for Carbide 
and Carbon Chemical Corp., Oak Ridge, Tennessee. 

Dr. K. L. Chung, formerly of Princeton University, has been appointed to an 
assistant professorship at Cornell University. 

Dr. Clyde H. Coombs, Associate Professor of Psychology and Chief of Re¬ 
search Division, Bureau of Psychological Services at the University of Michigan, 
is on leave of absence for the academic year to work at Harvard University on 
problems of scaling. 

Dr. Meyer A. Girshick, formerly with the Douglas Aircraft Co., Santa Monica, 
California, has accepted a professorship in the Department of Statistics, Stanford 
University, Stanford, California. 

Dr. M. J. Gottlieb, who has been with the Institute for Advanced Study at 
Princeton, has been appointed to an assistant professorship at the Newark 
College of Rutgers University. 

Associate Professor E. H. C. Hildebrandt of Northwestern University has 
been elected President of the National Council of Teachers of Mathematics. 
He h also National Secretary-Treasurer of Pi Mu Epsilon and Secretary of the 
.Mathematics Section of the Central Association of Science and Mathematics 
Teachers 

Dr. C. A. Hollingsworth, formerly with the Acetate Section of the DuPont 
Company, is now an instructor in the Department of Chemistry, University of 
Pittsburgh. 

Professor William G. Madow, who has been with the Institute of Statistics 
at the University of North Carolina, has been appointed Professor of Statistics 
at the University of Illinois. 

Dr. Zenon Szatrowski, formerly teaching in the Economics Department of 
Northwestern University, has accepted an associate professorship in the Depart¬ 
ment of Economics, University of Oregon, Eugene, Oregon. 

Mr. Eric Weyl has resigned his position as staff engineer in the Chicopee Manu¬ 
facturing Corporation and is now conducting his own business as a textile en¬ 
gineering consultant in Manchester, New Hampshire. 

321 



322 


NEWS AND NOTICES 


New Members 

The following peisons havo been elected to membership in the Institute (December 1, 

1948 to February 28, 1949). 

Abrazzi, Adam, M.S (Columbia Univ.) Student in engineering at Columbia University, 
22 W. 107th Street, Shanks Village, New York. 

Agarwal, Satya P., M.A. (Agra Univ., India) Student at Univcisily of California, Inter¬ 
national House, Berkeley 4, California. 

Anderson, Robert W., M.A (Columbia Univ.) Student at Columbia University, 81488-112 
Road, Queens Village 9, New Yoik. 

Bahadur, R. R., M,A. (Univ. of Delhi, India) Graduate Student at University of North 
Carolina, Chapel Hill, North Carolina. 

Blom, Gunnar, Fil.kand. (Stockholm) 01 of SkoLkonitnys vag S, Aspudden, Sweden 

Burrows, Glenn L., M.A. (Michigan State College) Research Associate, P.O. Bo\ 108, 
Institute of Mathematical Statistics, Chapel Hill, North Carolina 

Chapman, Carlos A., Jr., M S (Univ of Michigan) Sales Statistician, Argus, Inc., Ann 
Arbor, Michigan, 834 W ■ Huron St , Ann Aibcn , Mich. 

Chlang, Chin Long, M.A (Univ of Calif.) Student at tho University of California, SSO-A 
PanoramicWay, Berkeley 4, California 

Coggins, Paul B., M S. (Univ of Wisconsin) Graduate Teaching Assistant, University 
of Michigan, Vmrersity Club, Madison 5, Wisconsin. 

Crapsey, Marcus T., A.B. (Univ. of Michigan) Graduate student nt tho University of 
Michigan, CIS Monioe, Ann Arbor, Michigan. 

Coy, John W., MA. (Umv. of New Mexico) Teaching Fellow, Department of Mathe¬ 
matics, University of Michigan, 2044 Whilewood, Ann Aibor, Michigan. 

Cutkosky, Richard E., Student at Carnogio Institute of Technology, Box 401, Carnegie 
Institute of Technology, Pittsburgh, Pennsylvania. 

DelPrlore, Francis R., B.A. (New York Univ.) Associate Statistician, U S. Naval En¬ 
gineering Expenmont Station, 2609-22nd Street, N.E., Washington 18, D. C. 

Deslnd, Philip, M,S. (College of City of N. Y.) Statistician, Bureau of Ships, Navy 
Department., Washington, D. C , 7418 Georgia Ave., K.W., Washington, D. C. 

Dutka, Solomon, M A (Columbia Univ ) Chief Statistician, % Elmo Ropor, 30 Rocke¬ 
feller Plaza, New York City, New York. 

Dwass, Meyer, B.A. (George Washington Umv.) Graduate student at Columbia Uni¬ 
versity, Apt. 8A, 609 W. US St., New York, New York 

Eastman, Walter F,, A.B. (Harvard) Central Technical Department, The American 
Brass Co , Waterbury, Connecticut. 

Eisenpress, Harry, B.A (College of City of N. Y.) National Bureau of Economic Re¬ 
search, 1819 Broadway, New York 23, New York, 8985 Ocean Parkway, Brooklyn 24, 
New York. 

Fellows, Clifford Martin, B,S. (Boston Univ.) Assistant Instructor, Boston University, 
Bureau of Research and Statistics, S8S Commonwealth Avenue, Boston IS, Massachusetts. 

Gowen, John W., Ph.D. (Columbia Univ.) Professor of Gonotics, Genetics Department, 
Iowa State College, 2014 Kildee, Ames, Iowa. 

Greenwood, Robert E., Ph.D. (Princeton Umv.) Assistant Profossor of Applied Mathe¬ 
matics, University of Texas, 1104 Windsor Road, Austin, Texas 

Hald, Anders, Ph.D. (Univ. of Copenhagen) Professor of Statistics, University of Copen¬ 
hagen, Emdrupvenge 94, Copenhagen 0, Denmark 

Helms, William R., Student at Ohio State University, Stadium Club, Ohio Slate University, 
Columbus 10, Ohio. 

Hemphill, F. M„ M.S Ph. (Univ. of Michigan) Major, U. S. Public Health Service, Sohool 
of Public Health, University of Michigan, Ann Arbor, Michigan. 



NEWS AND NOTICES 


323 


Himes, Harold W., B,S. (George Pepperdine College, Loa Angeles) Statistician, Test 
Design and Analysis Section, U.CD.W.R., U. S. Navy Electronics Laboratory, San 
Diego 52, California. 

Hutchinson, L. Charles, Ph.D. (Mass. Institute of Tech.) Associate Professor of Mathe¬ 
matics, Polytechnic Institute of Brooklyn, Brooklyn, New York. 

Klahr, Carl N., M S. (Carnegie Institute of Tech.) Student, Atomic Energy Commission 
Eellow, Carnegie Institute of Technology, 6SS7 Phillips Avenue, Pittsburgh 17, Penn¬ 
sylvania 

Kraemer, Herbert F., B S (Umv of Delaware) Statistical Engineer, Technical Super¬ 
visor, Commercial Solvents Corporation, Terre Haute, Indiana, 1514 /South 7th St., 
Terre Haute, Indiana. 

Kuebler. Roy R., Jr., A.M. (Umv of Pennsylvania) Associate Professor of Mathematics, 
Dickinson College, Carlisle, Pennsylvania. 

Lafontant, Herne E., MS. (Atlantic Univ.) Student at the University of Michigan, 
61B Monroe, Ann Arbor, Michigan. 

Lai, Dip Naravan, Ph D. (Edinburgh Univ.) Lecturer in Mathematics, Patna University, 
New Dak Bungalow Road, Patna, Bihar, India. 

Liserre, Guido Orlando G., Profesor de Estadistica, Mendoza SB 40, Rosario, R., Argentina 

Matson, J. H., B A. (Umv. of Wisconsin) Statistician, Baker Manufacturing Company, 
Evansville, Wisconsin, 

Monsch, Henry D., B.S. (Missouri School of Mines & Metallurgy, Rolla) Metallurgist, 
Aluminum Company of America, Fabricating Division, Alcoa, Tennessee, BB07 Lake 
Shore Drive, Knoxville, Tennessee 

Moore, Lucius T. t Ph D. (John Hopkins Univ.) Associate Professor, Department of 
Mathematics, Brooklyn College, SOB Hicks Street, Brooklyn, New York. 

Noack, Albert, Ph.D. (Kiel, Germany) Privatdozent, Studienrat, (24a) Hamburg- 
Lokstedt II, Tibarg 26, Germany. 

Patton, Robert E., A B. (N Y. State Teachers College, Albany) Graduate student at the 
University of Michigan, BSS Linden St., Ann Arbor, Michigan. 

Potter, Muriel, Ph.D. (Columbia Univ.) Instructor in Psychological Foundations, Edu¬ 
cational Research and Reading Supervisor, Teachers College, Columbia University, 
414 Riverside Drive, New York SB, New York. 

Putz, Robert R., B.A (Univ of Minnesota) Teaching Assistant, Department of Mathe¬ 
matics, University of California, 1631 Cornell Avenue, Berkeley 2, California. 

Ratoosh, Phllburn, M.A. (Columbia Umv.) Assistant in Psychology, Department of 
Psychology, Columbia University, New York 27, New York 

Richardson, Wyman, Jr., S.B. (Harvard) Graduate student at the University of North 
Carolina, S08-B, Chapel Hill, North Carolina. 

Rosenbaum, Sidney, M A. (Cambridge) Scientific Officer, Ministry of Works, 31, Multon 
House, Shore Place, London E.g., England 

Savage, I. Richard, M.S. (Umv. of Michigan) Student at Columbia University, 1414 
John Jay Hall, New York 27, New York. 

Sheerln, Gall, A B (Univ. of Rochester) Statistical Technician, A.E.C. Project, Uni¬ 
versity of Rochester, 1091 Highland Avenue, Rochester, New York. 

Slegert, Arnold J. F„ Ph.D. (Leipzig, Germany) Professor of Physics, Department of 
Physics, Northwestern University, Evanston, Illinois 

Simpson, Paul B., Ph.D. (Cornell Univ.) Assistant Professor of Economics, Department 
of Economics, Stanford University, California. 

Solem, Alison D„ M.S. (Harvard Univ.) Chief of Fragmentation Section, Naval Ordnance 
Laboratory, White Oak, Maryland, 1S1 Galveston St., S.W., Washington SO, D. C 

Sorensen, Frederick A., B S. (Carnegie Institute of Tech.) Teaching Assistant in Mathe¬ 
matics, Carnegie Institute of Technology, 1204 East End Avenue, Pittsburgh 18, Penn¬ 
sylvania. 



324 


NEWS AND NOTICES 


Steel, Robert G. C., M.A. (Acadia Univ., Canada) Instructor and Research Associate■ 
Statistical Laboratory, Iowa State College, Amps, Iowa 

Taylor, Francis B., A.M. (Columbia Univ) Instructor in Mathematics, Manhattan 
College, New York and Graduate student at Columbia University, 346 E. 198 Si , Bronx 
68, New York. 

Terrell, James R., AB. (Univ, of Michigan) Statistical Clerk, Research Center for 
Group Dynamics, P 0 Box 851, Ann Arbor, Michigan. 

Tick, Leo J, B.S (Iowa State College) Research Graduate Assistant, Statistical Labora¬ 
tory, Iowa State College, Ames, Iowa. 

Tyler, Sylvanus A,, S.M. (Univ- of Chicago) Associate Mathematician (Biometrics), 
Argonne National Laboratory, P.0 Bov 5207 , 9059 So. Slcwarl Avenue, Chicago 80, 
Illinois. 

Tysver, Joseph B„ M.A (Washington State College) Teaching Fellow, University of 
Michigan, 1404 Ening Court, Willow Run Village, Michigan. 

Umarjl, Raghavendra R., AM (Columbia Univ.) Lecturer in Mathematics, Bombay 
Educational Service, 509 John Jay Hall, Columbia University, New York 27, New York, 

Wilburn, A. J., A.B. (Howard Univ,) Statistician, Civil Aeronautics Board, Washington, 
D. C,, 85-46th Place., N E , Washing Ion, D. C. 


Correction 

The information following Paul Koditschek's name which appeared in the Maroh issue 
of the Annals, page 149, should have appeared as follows: 

Kodltschek, Paul, LI. D. (Univ. of Vienna) Researoh Associate, Scientific Research Service, 
819 W. 18lh Street, New York 14, New York, 

(It was implied in the original notice that Scientific Research Service is eonneoted with 
Columbia University,) 


News Item from Cornell 

With the continued support of a research contract with the Office of Naval Resoaroh, the 
Mathematics Department of Cornell University is further expanding research and instruc¬ 
tion m the theory of probability and its applications, At present Professors Feller, Kao, 
Chung and Dr, Donsker are participating in the work. Professor G, Elfving of the Uni¬ 
versity of Helsingfors has been appointed Visiting Professor of Mathematical Statistics 
for the academic years 1949-1951. Professor J. L, Doob, on sabbatical leave from the Uni¬ 
versity of Illinois, will spend the year 1949-50 at Cornell. Dr. Gilbert Hunt has been ap¬ 
pointed Assistant Professor of Mathematics. 



REPORT ON THE NEW YORK MEETING OF THE INSTITUTE 


The thirty-eighth meeting of the Institute of Mathematical Statistics was 
held at Columbia University, New York City on Friday afternoon and Saturday, 
April 8-9, 1949. The meeting was attended by 93 persons including the follow¬ 
ing 80 members of the Institute: 

A, Abruzzi, T. W. Anderson, Leo A. Aroian, Robert Bechhofer, A, A. Bennett, Joseph 
Berkson, Allan Birnbaum, C. I. Bliss, Paul Boschan, P. G. Carlson, Uttam Chand, Yumen 
Chen, E. P, Coleman, T E Cope, Jerome Cornfield, L M Court, M, I Cropsen, J H. Cur¬ 
tiss, Cuthbert Daniel, P R Del Priore, W. E. Deirnng, J. A Dudman, David Durand, 
C. W. Dunnett, A Dvoretzky, P, S. Dwyer, Churchill Eisenhart, H. L Edgett, Harry 
Eisenpress, Lillian R. Elvebaok, D. A. S, Fraser, Murray Geisler, L. A. Goodman, J. I. 
Griffin, C C. Grove, E J. Gumbel, Miriam S, Harold, Mina Haskind, L H. Herbach, Harold 
Hotelling, Cuthbert Hurd, Arthur Kaufman, Roger D. Keeney, Paul ICoditsohek, Cail F 
Kossnok, Howard Levene, Jack Laderman, I. D. Lorge, C. L Marks, Paul Meier, Frederick 
Mosteller, E B, Mundie, C. M Mottley, I. U Mulk, Paul Neurath, G. E. Noether, Doris 
Newman, M. L Norden, E. W. Pike, J. K Perrin, H. M. Rosenblatt, Frank Saidel, William 
Salkind, F. E Satterthwaite, Richard Savage, Henry Scheffg, H L. Seal, Jack Shermai), 
Rosedith Sitgreaves, J. H Smith, J J Sodano, Herbert Solomon, Mary N Torrey, J. W. 
Tukey, S. S Wilks, D. F, Votaw, Helen M Walker, Lionel Weiss, Jack Wolfowitz and 
W. W. Wryht, 

The Friday afternoon session consisted of a Symposium on Applications of 
Multivariate Analysis, Professor S. S. Wilks of Princeton University presiding. 
The following two invited papers were given: 

1. Tests of Differences m Composite Growth Measurements m Pig Feeding Trials, J. Wishart, 
Cambridge University and University of North Carolina. 

2 Fields of Application of Multivariate Analysis, Harold Hotelling, University of North 
Carolina. 

The prepared discussion was presented by Professor S N. Roy, Presidency 
College, Calcutta, and Columbia University, followed by discussion from the 
floor. 

The Saturday morning sesssion was opened by a business meeting, Dr. 
Churchill Eisenhart, National Bureau of Standards, presiding. Among other 
items of business the Constitution of the Institute was amended to provide for 
Institutional Membership, and the by-laws amended to specify the status and 
privileges of Institutional Members, The revised Constitution and By-Laws 
appear elsewhere m this issue. 

The second part of the session, Dr, W. Edwards Deming presiding, was 
devoted to an invited address: Non-Ldnear Regression Laws and “Internal Least 
Squares by Dr. H. 0. Hartley, University College, London and Princeton 
University. 

At the Saturday afternoon session, Professor Henry Scheff6, Columbia Uni- 

325 



32G 


REPORT ON NEW YORK MEETING 


versily, presiding, the following contributed papers were presented, ten in 
person, three by title: 

1. Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given 
Matrix. 

Jack Sherman and Winifred J. Morrison, The Texas Company Research Laboratories, 
Beacon, N. Y. 

2 The Distribution of the Number of Exceedances. 

E. J. Gumbel, New York, N. Y., and H. von Schelling, Naval Research Laboratory, 
New London, Connecticut, 

3. Note on the Power Function of a Quality Control Chart. 

Leo A. Aroian, Hunter College 

[4 Tests between Two Means or Regression Coefficients When Observations Are of Unequal 
Precision. 

Uttam Chand, University of North. Carolina, 

5. Functional Expansions. 

Eugene W, Pike, Boston, Massachusetts, 

6 The Geometric Range for Distributions of Cauchy's Type. 

E J Gumbel, New York, N Y,, and R. D. Keeney, Metropolitan Life Insurance Com¬ 
pany, New York, N. Y. 

7. On Sums of Random Integers Reduced Modulo m. 

A. Dvoretzky, Hebrew University, Jerusalem, and Institute for Advancod Study, and 
J. Wolfowitz, Columbia University. 

S The Corpuscle Problem: Estimating the Surface-Volume Ratio of a Corpuscle of Arbitrary 
Shape. 

Jerome Cornfield, National Institute of Health, and Harold W. Chalkoy, National 
Cancer Institute, Bethesda, Maryland. 

0. Generalized Hit Probabilities with a Gaussian Target. 

D, A S, Fraser, Princeton Univeisity. 

10 A New Continuous Sampling Inspection Plan Based on an Analysis of Costs. 

F E. Satterthwaite, General Electric Company, Bridgeport, Connecticut. 

11 On Levels of Significance of the F and Bela Distributions. (By title) 

Leo A. Aroian, Hunter College. 

12. Certain Statistics for Samples of 3 from a Rectangular Distribution. (By title) 

Julius Lieblein, Statistical Engineering Laboratory, National Bureau of Standards. 

13. The Choice of Lot Inspection Plans on the Basis of Cost. (By title) 

F. E Satterthwaite aud Burton Grad, General Electric Company, Bridgeport, Con¬ 
necticut. 

On Friday evening a dinner was held at the Men’s Faculty Club. 

8. B. Littauhr 
Assistant Secretary 



CONSTITUTION OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 

ARTICLE 1 
Purpose 

The Institute of Mathematical Statistics is a society for encouraging the 
development, dissemination, and application of mathematical statistics. 

ARTICLE 2 

Members 

The Institute shall have Members and Institutional Members Applications 
for membership must be approved by the Council. The Council may delegate 
this authority. 

Except for nonpayment of dues, no Member or Institutional Member shall be 
expelled or suspended except by three-fourths vote of the Council. 

ARTICLE 3 
Officers 

The Officers of the Institute shall be the President, the President-Elect, the 
Secretary, the Treasurer, and the Editor. The terms of office of the Secretary, 
the Treasurer and the Editor shall be three years. The terms of office of the 
President and the President-Elect shall be one year. The President-Elect shall 
succeed the President in that office. If the President is incapacitated, the 
President-Elect shall act as President, or, in case the President-Elect is also 
incapacitated the Secretary shall so act. Incapacity shall be determined by the 
Council. 

The President shall act as chairman of the Council and of the Executive Com¬ 
mittee, and shall appoint the Committees and representatives of the Institute, 
with the exception of the Committee on Fellows and the Executive Committee. 
Such Committee appointments shall be for terms of not more than three years, 
provided that committee appointments extending beyond the current year shall 
be either to standing committees with regularly rotating membership, or to 
temporary committees assigned specific tasks. 

The Treasurer shall present financial statements to the Council and shall bring 
condensed statements to the attention of the Members. 

The Secretary shall record the actions of the Council and of the Executive 
Committee and of Institute meetings, arrange for and inform the Members of 
meetings and conduct the correspondence of the Institute except as otherwise 
assigned by the Executive Committee. The Secretary may appoint Assistant 
Secretaries to assist him in connection with specified meetings or for other 
occasions. The offices of Secretary and Treasurer may be combined. 

327 



328 


CONSTITUTION AND BY-LAWS 


ARTICLE 4 
Council 

The Council shall consist of not less than twelve elected members in addition 
to the Officers of the Institute except that vacancies in the Council occurring 
subsequent to an election shall not be filled until the next annual election. 

Elected members shall be elected for terms of three years, the terms of approxi¬ 
mately one-third of them terminating each year. 

The Council, representing the Members, shall determine the policies and 
supervise the affairs of the Institute in accordance with any Bylaws the Institute 
may adopt. It shall determine the standing committees of the Institute and 
the number of elected members of the Council. 

The Council shall elect the Secretary, the Treasurer, and the Editor, by 
majority vote The Council shall determine the number, if any, of Associate 
Secretaries, Associate Treasurers and Associate Editors. The Secretary shall 
nominate Associate Secretaries, the Treasurer shall nominate Associate 
Treasurers, and the Editor shall nominate Associate Editors which the Council 
may elect by majority vote. Such Associate Secretaries, Treasurers, and 
Editors shall be non-voting members of the Council. 

The Council shall meet at least twice a year, usually at times of mootings of 
the Institute, and otherwise at the call of the President or the call of any five 
members of the Council, Any voting member unable to be present may appoint, 
in writing, a representative to speak for him, and such representative Bhall be 
entitled to vote. A quorum shall be seven persons entitled to vote. Majorities 
and other fractions of the Council are to be based on the number of persons 
present and entitled to vote 

ARTICLE 5 
Executive Committee 

The Officers shall constitute the Executive Committee of the Council, and 
shall conduct the affairs of the Institute. 

The Executive Committee may create temporary committees with assigned 
tasks coming within the scope of the Institute. 

ARTICLE 6 

Nominations 

The President shall appoint a Nominating Committee and shall announce 
their names at the annual meeting when he retires from office. This Committee 
shall submit to the Members, through the Secretary and at least sixty days 
before the closing of polls at the next succeeding annual meeting, one nomination 
for President-Elect and a slate containing at least twice as many names as there 
are vacancies on the Council. 

Additional nominations may be made for President-Elect or for the Council 
by a petition signed by twenty Members. Such nominations shall appear on 



CONSTITUTION AND BY-LAWS 


329 


the ballot if they are in the hands of the Secretaiy at least 30 days before the 
closing of polls at the next succeeding annual meeting. In any event, Members 
may vote for names in addition to those nominated. 

ARTICLE 7 

Fellows 

The Council, may, by majority vote, elect to fellowship any Member nomi¬ 
nated by the Committee on Fellows. Such nomination and election shall be on 
the basis of the nominee’s contributions to the development, dissemination, and 
application of mathematical statistics. 

ARTICLE 8 
Committee on Fellows 

The Council shall elect two Fellows annually to serve for three years on the 
Committee on Fellows. One of the Members whose term is next to expire shall 
be designated by the President as chairman. 

ARTICLE 9 

Publications 

The Annals of Mathematical Statistics shall be the official journal of the Insti¬ 
tute. Other publications may be authorized by the Council. 

' The publications of the Institute shall be supervised by the Editor, with the 
assistance of the Associate Editors and such committees as the Council may 
approve. 

ARTICLE 10 
Communications 

Public announcements concerning the Institute, including statements of policy, 
recommendations, reports of committees and accounts of Council meetings shall 
be issued by the Secretary or the President with the prior approval of the Council 
or its Executive Committee. Advance publicity concerning meetings may be 
released by authorized Program Committees or Publicity Committees 

ARTICLE 11 

Affiliation 

By a three-fourths vote, the Council may authorize the affiliation of the 
Institute with any organization whose aims are consistent with those of the 
Institute, 

ARTICLE 12 

Amendments 

This constitution may be amended by an affirmative two-thirds vote of those 
Members voting at any regularly convened meeting of the Institute provided 



330 


CONSTITUTION AND BY-LAWS 


notice of such proposed amendment shall have been sent to each Member by 
the Secretary at least thirty days before the date of the meeting at which the 
proposal is to be acted upon. Members may vote in person or by mail. The 
Secretary shall send to the Members any amendments recommended by the 
Executive Committee or proposed through a petition of 25 members of the 
Institute. 

ARTICLE 13 

Emergencies 

In an emergency, as determined by the President or the Executive Committee, 
or by a majority of the Council, a meeting of the Council to transact business 
or a meeting of the Institute to amend the constitution may be conducted by 
mail. 

BY-LAWS OF THE INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE 1 
Duties op Officers 

The President, or in his absence the President-Elect, or in his absence a Mem¬ 
ber appointed by the Executive Committee, shall preside at business meetings 
of the Institute. 

The Treasurer shall send out calls for annual dues, pay all bills for expenditures 
authorized by the Institute, Council, or Executive Committee; keep a detailed 
account of all receipts and expenditures; prepare a financial statement at the 
end of each fiscal year and present an abstract of same at a business meeting of 
the Institute after it has been audited by a Member or Members appointed by 
the President, to whom such Member or Members shall report. 

The Secretary shall, subject to the direction of the Council, have charge of 
the archives and other tangible and intangible property of the Institute and shall, 
upon the direction of the Council, publish a classified list of all Members of the 
Institute, and of Institutional Members at their request. 

The Editor, subject to the direction of the Council, shall have charge of all 
editorial matters, whether relating to the official Journal or to other publications. 
He shall, with the advice and consent of the Council, appoint an Editorial Com¬ 
mittee of not less than twelve Members to cooperate with him for definite terms. 
All appointments to the Editorial Committee shall terminate with the appoint¬ 
ment of a new Editor. 


ARTICLE 2 
Dues 

Members shall pay seven dollars at the time of admission to membership and 
shall receive the full current volume of the official Journal. Thereafter Members 
shall pay seven dollars annual dues, of which five dollars shall be for a subscrip¬ 
tion to the Official Journal There shall be the following exceptions: 



CONSTITUTION AND BY-LAWS 


331 


A. Two Members of the Institute who are husband and wife may elect to 
receive one copy of the Official Journal between them, when their dues 
shall each be reduced by twenty-five percent, 

B. Any Member may make a payment m place of all succeeding annual duos 
based on a suitable table and rate of interest specified by the Council. 

C. Any Member on active military duty may notify the Treasurer that lie 
wishes neither to pay dues nor to receive the Official Journal during the 
current year. He may receive the official Journal for the suspended 
years on payment of one-half of the suspended dues within one j r eai' after 
resuming payment of annual dues. 

-D. Any Member who resides outside the Western Hemisphere shall pay five 
dollars annual dues. 

Institutional Members shall pay annual dues of at least $100, For each $100 
of annual dues, an Institutional Member shall receive two copies of the Official 
Journal, one bound, and shall be entitled to designate one person to have the 
full prerogatives of a member without further payment of dues (including the 
receipt of a personal copy of the Official Journal). Twenty-five dollars of each 
$100 shall be allocated to the three subscriptions to the Official Journal and the 
binding of one copy. 

Annual dues shall be payable on the first day of January of each year. 

It shall be the duty of the Treasurer to notify by mail anyone whose dues are 
six months in arrears, enclosing a copy of this article. If such person fails to 
pay such dues within three months from the date of mailing such notice, the 
Treasurer shall report the delinquent to the. Council, who may suspend the 
delinquent from membership and who may reinstate the delinquent upon pay¬ 
ment of arrears. 

ARTICLE 3 
Salaries 

The Institute shall not pay a salary to any Officer, Councilor, or member of 
any committee. 


ARTICLE 4 

Amendments 

These Bylaws may be amended in the same manner as the Constitution or, 
if the proposed amendment has been previously approved by the Council, by a 
majority vote at any regularly convened meeting. 



JOURNAL OF THE AMERICAN 
STATISTICAL ASSOCIATION 

DECEMBER, 1948 
Articles 

Commercial Uses of Sampling J, Stevens Stock and Joseph R. Hoohstim 
V ariation of the Frequency of Fatal Quarrels with Magnitude Lewis F. Richardson 

Bank Reserves and Business Fluctuations . Clark Warburton 

The Ordering of n Items Assigned to k Rank Categories by Votes of m Individuals 

Garret L Schuyler 

Levels of Significance for Variance Ratio of Two Samples of Equal Sizo 

C. j. IvIRCHKN 

Main Effects and Interactions ,. . . . . D. J. Finney 

A Test for Symmetry m Contingency Tables . Albert H. Bowxer 

The War Production Board’s Statistical Reporting Experience, Part IV 

David Novice and Georqe A. Steiner 
Correction to "On Estimating Precision of Measuring Instruments and Product 

Variability” . . Frank E. Grubbs 

Statistical Methodology Index . . Obcar Kbiben Burob 

AMERICAN STATISTICAL ASSOCIATION 

1603 K Street, N. W., Washington 6, D, C. 


MATHEMATICAL REVIEWS 


A journal containing reviews oj the mathematical liter - 
oture of the world, with full subject and author indices 

Publication of this journal is sponsored by the American Mathe¬ 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others 

Subscriptions accepted to cover the calendar year only. 

Issues appear monthly except July. $20,00 per year. 


Send subscription order or request for sample copy to 

AMERICAN MATHEMATICAL SOCIETY 
531 West ll(th Street, New York City 27 





ON THE THEORY OF SYSTEMATIC SAMPLING, II 

By William G. Madow 1 
Institute of Statistics, University of North Carolina 

1. Summary and introduction. In an earlier paper, 2 [1] an approach to the 
problem of systematic sampling was formulated, and the associated variance 
obtained. Several forms of the population were assumed. The efficiency of the 
systematic design as compared with the random and stratified random design 
was evaluated for these forms. It was remarked that as the size of sample in¬ 
creased the variance of a systematic design might also increase, contrary to the 
behavior of variances in the random sampling design This possibility was verified 
in [2). 

One approach to the study of systematic designs, given by Cochran [3] removed 
this difficulty to some extent by changing the problem to one of the expected 
variance, and supposing the elements of the population to be random variables, 
He shoved that if the correlogram of these random variables is concave upwards, 
then the expected variance of the systematic design would be less, and often 
considerably less, than the variance of a stratified design. 

In the present paper the results of the earlier papers are extended to the sys¬ 
tematic sampling of clusters of equal and unequal sizes. Some comments on 
systematic sampling in two dimensions are included. 

In section 2 we derive two theorems that have considerable applications in 
many parts of sampling. Although it has been common for people working in 
sampling theory to tell each other that these theorems ought to be true, yet no 
reference seems to exist. 

In section 3 we develop the implications of a remark [1, p 13] that m designing 
sample surveys we should try to induce negative correlation between strata. In 
Theorem 3 we obtain sufficient conditions for the correlation to be negative. 
The lemma and Theorem 4 given in Section 4 enable us to extend the uses of 
Theorem 3 in practice As an application of these results, we show that if a 
population has a concave upwards correlogram, and if stiata are defined m an 
optimum fashion for the selection of one element at random from each stratum, 
then we can define a systematic type design that will be more efficient than 
independent random selection from each stratum. 

In sections 5 and 6 we obtain various results m the systematic sampling of 
clusters largely as applications of the more general theorems of the earlier sec¬ 
tions. In general the results are of a nature similar to those of [1] and [3] in that 
the formulae show the conditions under which systematic sampling may be 
expected to be more efficient than random or stratified' random sampling We 
have not, however, applied these formulae to specified types of populations 

1 Submitted foi publication, November, 1948, Parts of this paper were prepared while 
the author was Visiting Professor of Statistics at the University of Sao Paulo, Brazil 

2 References to the aitieles and book cited are given by Roman numerals 

333 



334 


WILLIAM 0. MADOW 


From [1, 2 and 3] it is already apparent that this work will be useful and such 
studies should be more valuable when made in connection with important types 
of surveys or data than when made as illustrations in a general paper. 

2. Random events and conditional expectations. Almost invariably, samples 
are selected in several stages. For example, to Belect a sample of households from 
a city one frequently used method is the following two stage sampling plan: 

a. A map of the city showing the location of each block is obtained and 
brought up-to-date. 

b. Using this map, a sample of the blocks of the city is selected (this is stagel). 

c. From the households on the blocks selected in stage 1, a subBample of house¬ 
holds is selected (this is stage 2.). 

In this section, we give a general approach for evaluating the means and 
variances associated with multi-stage sampling. This approach has the ad¬ 
vantage of at once yielding the contributions to the variance arising from 
each stage. Furthermore, the theorems presented are useful in calculating vari¬ 
ances even when our interest is not in multi-stage sampling. The theorems are 
presented in general terms because of their wide application in sampling. 

We shall say that the result of performing an operation is a random event A* 
if the result can assume m possible states Ai, • • ■ , A„ with probabilities pi, * ■ ■ , 
p m , where 

P[A+ = A,} =• p { , £ pi * 1, 

and P(A* = A;) is read “the probability that the random event A* assumes 
the state A<." 

One illustration of an operation is the operation of selecting a sample of blocks. 
If there are N blocks in the city of which we select n in such a way that each 
set of n of the N blocks is a possible sample, then there are C* possible samples. 
In this case m = C* and the C„ possible samples are the m states of A* “the 
result of selecting the sample of blocks.” Furthermore, if each of the possible 
samples of blocks is equally likely to be selected, then 

PIA- - Ail - £ -i 

The random event A* may also be the taking on by a random variable of 
one of its possible values. If z* is a random variable having possible values 
si, ■ ■ • , z m with probabilities pi, • • ■ , p m then we can define the states of 
A* to be Ai where A< is “ z* = z t 

Thus the notion of a random event includes the two types of randomness 
that are met in selecting samples. 

Let x 1 be a random variable. Then, by the conditional expectation of x' subject 
to the random event A* is meant the random variable E*(x' [ A) whose possible 
values are E(x' \ A,), i = 1, • • • , m and whose probabilities are p,, that is 

P[E*(x' | A) - E(x' 1 Ai)} - p { = P[A* = Ai], 



SYSTEMATIC SAMPLING 


335 


where 

if, 

(2.1) E(x' | A.) = £ x„pM*), 

j-1 

xn is the jth of the N, possible values of x’ when A, occurs, and 

P,(A,) = PW = x >i I 4.1 

is “the probability that x’ = xu given that A* occurs.” It should be noted 
that if 

p*i = P{x' = x tJ }, 

then 

p tJ = P[x‘ = x,j, A* = A,} 

since the fact that x' = x x j implies the occurrence of A,. Then 

(2.2) Pi‘P,(A t ), = p„ . 

We state Theorems 1 and 2 without proof since their proofs are immediate. 
Theorem 1. The expected value of the random variable E*(x' | A) is E x', i.e. 

E{E*(x' | A)} = Ex'. 

By <r*v\A we shall mean the random variable whose possible values are 
<r,vM. , i = 1, • ■ ■ , m, where 

= E[[x' - E(x' 1 Ai)} \y' - E(y' | A,)] | A,} 

and 

P [<r X 'y'\A ~ = p, — B}A* = A,), 


i.e. 

= E*{[x' - E*{x' | A)] [y' - E*{y' | A)] | A). 

Furthermore, the symbol will stand for “the covariance of the 

two random variables E*(x' | A) and E*(y' \ A) ” The corresponding definitions 
of variance are obtained by replacing y' by x' above. 

Theorem 2. If x' and y' are random variables, then 

TV )|c 

<Tz’y’ — Ecr x i v i\A + <rs»(.x'\A) 

and 

0 V = EjI'ia + ■ 

We note that, since the p {j , p, and p,(A,) are not specified, Theorems 1 and 
2 are valid for any two-stage plan. The generalizations of Theorems 1 and 2 
to multi-stage plans are obvious, but in practice it often turns out to be simpler 
to apply the theorems several times. 



336 


WILLIAM G. MADOW 


It would be easy to give applications of Theorems 1 and 2 but these are not 
essential for our purposes in this paper. As remarked in the introduction, these 
two theorems have long been part of what we may call the folklore of sampling 

3. Stratified sampling and negative correlation, with an application to syste¬ 
matic sampling. In discussing plans for sampling from a stratified population 
it is customary to suppose that if %' is an estimate and x' = x[ + • • • + x\ 4 
where x\ is the contribution to x' arising from the jth of the L strata, then the 
sampling is to be so done that the random variables %[ and x'j, j ^ f, are inde¬ 
pendent 

In [1, p. 13] it was noted that if a population were stratified, and if the elements 
were so selected that the contributions from different strata were negatively 
correlated, it would follow that the variance of the estimate would be less than 
if the contributions were independent but had the same covariances wit hin 
strata. This was, of course, an immediate conclusion from the fact that 

L 

(Tx' 

1 , 1-1 

and, hence, if 

(3-D C = £ <*; < 0 

then ov is less than it would be if C = 0. If C < 0 we shall say that the sample 
design has “negative correlation.” 

It is obvious that any population may be taken to be itself a sample, a sample 
from the possible populations that might have been produced by the forces that 
determined the existing population, Inasmuch as sampling designs are often 
chosen on the basis of a knowledge of the dominating forces and some past 
experience, it is realistic to consider not only the expected values and variances 
for a specific population but also their expected values over all possible popula¬ 
tions determined by the same forces. Cochran [3] has given one illustration of 
the usefulness of considering the expected variance of a sample design. He 
considered the elements xi , • • • , x n of the population themselves to be random 
variables and supposed that E x ( = p and E(x { - pf - a. For his purposes 
it was also convenient to suppose that if u > 0 then E(x f - p) (z f+ „ - p) = 
Pu<r . It was then possible for him to make realistic hypotheses concerning the 
correlogram, i e. the p„ considered as a function of u, that would not have been 
reasonable in dealing with a specific population. He thus obtained general 
conclusions concerning the expected efficiency of systematic sampling designs 
as compared with random and stratified random designs 

In this paper we shall consider not only the expected values and variances 
for the given finite population but also the expected values of these expected 
values and variances under the assumption that the elements of the population 
are themselves random variables. We shall use & to denote the expected value 



SYSTEMATIC SAMPLING 


337 


considering the elements of the population to be random variables and as before 
use E for expected values based on the specified finite population. 

Then 

Soy = 23 &<Tx'i x ,, 

«.i-i 

and if &C < 0 we shall say that the design has ‘expected negative correlation ’ 
We now propose to obtain the beginnings of an approach to sample design 
when it is possible to introduce or take advantage of negative correlation or 
expected negative correlation through the sample design. 

To simplify, we shall begin by considering two strata and shall suppose that 
the possible values of %' are xi, ■ ■ ■ ,x n while the possible values of y' are yi, • ■ • , 
y n . Furthermore, we shall suppose the sampling to be so done that 

P{x' = x,} = P\y' = y,} = P{x' = x l ,y > = y,} = > 0, 

Ti 

so that 23 P* = 1 and P{x ' = Xi , y' = y,} =0 if i ^ j. 

Under the above assumptions, it follows that 

7 > n 

(3.2) <r*v = 23 P.s.0, - 13 P*P,x,y, ■ 

>-i i,j-i 

The symbol ip t , > 0 means that <pa > 0 for all i and j and <p t , > 0 for at least 
one paiT i, j. We shall say that if (x, — xi) (i/,- — yi) >0 then the sets (x) and 
(y), where (x) stands for xi , • • , x n and (y) for yi, • ■ ■ ,y n are similarly ordered 
and if (x, — Xf) {y, — yi) < 0 then these sets are oppositely ordered. Then it 
is easy to prove, [4, p. 43] directly that if the values are oppositely ordered, then 
<v„■ < 0 and if they are similarly ordered then oy„' > 0. 

A somewhat more general result is the following: 

Theorem 3. Let n < k, let 

n k 

b = 23 233 w . z , 

•-I i-i 

be a real bilinear form, and let 

n 

t = 12 a,iU>; 

n k 

be a real linear form , where Wi > 0, z t > 0 and E2 uh = = 1. 

*»i i-i 

Then a sufficient condition that b > t is 

(3.3) a^j Qij . 

If k = n and w t = then b > t if 

(3.4) ciij “l - a^i ^ ttti “H ajj , 



338 


WILLIAM G. MADOW 


Proof. Since 

» _ 
b - 1 = 71 - to,) + 72 a,jW{Zj , 

>-l 

and since 

k 

1 — «. = 7/ ij , 

ft 

it follows that 

b — l 72 {a<i ~ 

it* 

Hence, b > f if (3.3) holds. Also, if fc = n and to< = z< then b > i if (3.4) holds. 
Some obvious generalizations of Theorem 3 have been omitted since we do not 
need them. 

To obtain the result that oy„< < 0 if the sets (x) and (y) are oppositely ordered, 
we make the identifications a,j = x x yj and z< = w, = p<. Then (3.4) holds and 
substituting we have 

(3.5) a„ + ajj - a.y - a„ = (*,• - xj) {yi - y s ) 

so that if the values are oppositely ordered, oy V ' < 0, and hence the two strata 
have negative correlation. 

To consider expected negative correlation we note that 

n n 

(3.6) = 72 PtCii + 72 Vivian 

i-l »,;-l 

where we suppose that &x, — g, %,■ = v and 

&{xi - ft) fa - v) = c,j 

so that in this case <r„ is a covariance, not a variance. 

If we put a tj = <r,v and z,- — w, = p ,, then (3.4) holds and we obtain, as 
sufficient for &><r x y to be negative, that 

(3.7) (T ,7 + <Tjt > <r,i + vyy 
or, if we define p,j by the equation, 

where <rl = 6 ( x , — g) 2 and al = S(y, — p) 2 , we have 

(3.8) p XJ + pj, > p„ + pjj 

as a sufficient condition for &c zV < 0. 

Let us consider the systematic sampling of single elements. In systematic 
sampling, we assume a population of kn ordered elements x l , x t , - • • , x*, 
+*, • • • , Xu , ■ • • , xi + ( B _i)jfc, • • • , x n k of which we wish to estimate the arith- 



SYSTEMATIC SAMPLING 


339 


metic mean x. As our estimate we use 

x' = (xi + • • • 4- x' m )/n 

where x[ is selected at random from x\ ,■••,% and if %[ = x, then x[ = x i+ (,_i)t, 
i = 2, m. Thus, x! may be interpreted as an estimate based on a stratified 
population, the *th stratum consisting of 


X\+U-\)k , * • * , Zi+G-Dib 


and 


while 


Then 


where 


1)1.] — — Xa+ (t—1)& , Xj — l)fc] 1/lC 

P[Xt — Xa-\-(\~l)k , X , = ■TjS+Cl - !)!:} “ 9, if Ct ft. 


«- G) s 

*' = G) £ x ’*'' 


%a+(t—l)ls‘Xa-l-(i—l)k X, Xj 


-l)k- 


Hence, any two strata that are oppositely ordered will yield a negative contribu¬ 
tion to the variance. However, since it is not possible for all strata to be nega¬ 
tively ordered, we do not thus obtain a useful result and must return to the 
consideration of C or v\- itself as was done in [1]. If, however, we make Cochran’s 
assumptions, and consider &a x < v -, it follows that for the ith and jth strata 


Pafl = PU-i)k+f -a , 

and (3.8) becomes 

(3.9) p (,-.•)*+ + Po-o*+(«~« > 2p (/_,)*, 

i.e. the correlation function p u must be concave upwards, which Cochran showed 
by other means. By considering &C it is possible to show that a sort of average 
concavity is all that is required of the correlogram for systematic sampling to 
have a smaller variance than stratified random sampling. 


4. Conditions for negative correlation when the strata are of unequal sizes 
with an application to systematic sampling. Often, as in the systematic selection 
of clusters with probability proportionate to size (discussed in Section 5) the 
simplified situation dealt with in Theorem 3 does not directly apply. However, 
Theorem 3 may be used to advantage by the following device. 

Let us suppose the possible values of x' to be xi , • ■ ■ , x n and those of y’o to be 
Vi , • ■ ■ ,yl ,k > n and let 

P{y' = vl \*' = = ppi« 



340 


WILLIAM G. MADOW 


so that if we define 


(4.1) 

k 

Va — ypPp\a ) 

fl-i 

then 

2 la = E[ 2/0 | x‘ - S 0 ). 

If we define y' to be a random variable having possible values yi , ■ ■ ■ , y n 
with probabilities pi, • • ■ , p„ where 


Pa = P{x' = &„) 

it follows that 

y' = E*(y o | x') 

and 

O'x'vq = OVy' • 

Clearly, Theorem 3 is valid for the random variables x' and y’. 

Consequently, we need only determine what restrictions the conditional 
probabilities, p^ a , and the values, y ° a , need satisfy for the sets Xj , • • • , x n and 
2 /i, • • • , y n to be oppositely ordered or for (3.7) to hold. 

Substituting for y , and y s in (3.5) we see that if 

(4 2) 

k 

(x a - Xy) £ ylivt]* - Pe It) < 0 

0-1 

til Gil Gx'y' ~ 

Let 

< 0. 


tr 0 a T = ~ m) (v\ — v). 

Then substituting in (3.7) we see that if 

(4.3) 

k 

{V(3\a P&\y){<7 ctfi O’ y(i) ^ 0 

/5»1 

or if 


(4.4) 

k 

{pp]a P/9|7)(p«|3 Pyp) ^ 0 

then 

^ 0. 

In order to use (4.2) and (4.3) the following well-known lemma is often useful. 
Lemma, If fa < g 2 < ■ ■ ■ < & < 0 and the quantities ei, • • ■ , t k are such that 


a 

T, tp > o 

0-1 



SYSTEMATIC SAMPLING 


341 


then 


* 

X «/3 £/3 5= 0, S = 1, • • ■ , fc. 
(3-1 


Let us use this lemma to obtain another theorem that will be helpful in showing 
negative or expected negative correlation between strata. 

Theorem 4. Let b be a bilinear form 


n m 

b = ^ Clij'WtZj 


* s' 

Z/iaJ X) > 0, 2 3; > 0, 5 = 1, • • • , n — 1, a' *=* l f - • * , m — 1, and 

4-1 J-l 


(4.5) 


Ti Tti 

y v ^4 = y —' 


4-1 J-l 


= 0 


Let 


= at,* — di+ij — + a »+i.y+i • 

77i<m a sufficient condition that h < 0 is 8 t} < 0. 

Proof. Upon substituting for w* and z m in b from (4.5) we see that 

n —1 rn 1 

& = y i ^ -( 5 ij Zj 

4-1 J=*l 


where 


or, if we define, 


5,', — “f" 


n-»-l 

?y = X 5 >yw. 


then 

m—1 

& = X) c,*/ • 

j-i 

According to the lemma, it then follows that a sufficient condition that & < 0 is 
that 

Si < £2 5i • • • < ?m —1 ^ 0. 

Also, a sufficient condition that 

£y - €,+i < 0, 


5.-y — < S<+i.y ~ 5,+i, J+ i 


is 



342 


WILLIAM G, MADOW 


Then to complete the proof it is only necessary to verify that 
S,j = S'ij — 5i,j+i t . 

In the preceding pages we have given an identification of systematic with 
stratified sampling where, instead of the selection being made independently 
within strata, the choice of an element from one stratum determines the choice 
from the other strata. In this identification, however, it was assumed that the 
strata contained the same number of elements. Let us now extend this method 
of selecting samples to the case where the strata have different numbers of 
elements. In so doing we shall illustrate the use of the above lemma and theorem 
4. 

Suppose now that the population consists of N elements * 1 , * • • , Xs classified 
into n strata, the ith of which contains the Ni elements 


X X 1+ -+Wi_l+1 J 4 * ‘ > ®V I +...+ATj . 

We shall denote these elements by la , • • • , x tir , . 

We shall select one element from each of these n strata. The element selected 
from the tth stratum is written x ,. As the estimate of x, the arithmetic mean of 
the population, we use 





! 

X, 


and it is well known that if the selection is made independently at random from 
each stratum, then 



where o\ is the variance of x[ , i.e the variance of the tth stratum. 

Let us now consider an alternative to the usual method. We can suppose that 
Wi > 1 without any loss of generality. (The methods are the same for any stratum 
having JV, = 1 and will also yield the same result for any population such that 
either all the Ni = 1 or all but one of the N, — 1 . Differences occur if at least two 
of the N x differ from 1.) 

We first choose an element at random from the first stratum. Suppose that 
X\ ~ x a . Then to choose an element from the second stratum, assuming that 
Ni > I, we proceed as follows: Multiply Ni by any positive integer U such that 
Nik/Ni is an integer, say, ki . Assign to each element of the second stratum the 
measure of size U , and form the two sets of cumulative totals U , , • • • , NiU 

and ki,2k 2 , • • ■ , Nifa . Then with the measures of size U assigned to each element 
of stratum 2, and the measure of size ki assigned to each element of stratum 1, 
it follows that strata 1 and 2 have,the same total size. 

As an example of the arithmetic given below consider the following simple case. 
Suppose that Ni = 3 and Ni = 4. Then if we take for h the value 6, it follows 
that ki = 8. We choose one of the integers 1, 2, 3 with equal probability. If the 



SYSTEMATIC SAMPLING 


343 


integer 1 is obtained, we have selected the first element of the first stratum and 
choose an integer between 1 and 8 with equal probability. If the selected integer 
is between 1 and 6, the first element of the second stratum is selected. If it is 
7 or 8 the second element of the second stratum is selected. Similarly if the 
second element of the first stratum is selected, then we select an integer between 
9 and 16 with equal probability. If that integer has value 9, • ■ ■ , 12 the second 
element of the second stratum is selected; if it has value 13, • • ■ , 16 the third 
element is selected. 

The general formulation of the selection procedure for the second stratum is: 

Suppose that do is the smallest integer such that (a — 1 )k 2 + 1 < dift and that 
di is such that (ft — 1)U < ak 2 < ftfe. Choosen an integer at random from 
1, • • • , kt and call that integer ft Then, if 

(a — l)k 2 < (a — 1 )k% + d ^ Aft 

the doth element is selected from stratum 2; if 

doAi < (a — l)&i + d < (do + l)^i 

the (do + l)th element is selected; • • • ; and if 

(di — l)fe < (a — 1 )k% + d < «fc* 

the dith element is selected from stratum 2. 

It is easy to verify that when the sample is so selected, each element of stratum 
2 has equal probability of being selected. Hence, if we apply this procedure to 
each stratum we have 



Let us evaluate for this type of selection. Now 
aJii'i = E (x'i — *<)(** ~ £ f) 

where £< is the arithmetic mean of the elements of the tth stratum. From Theorem 
2, we then have 

i ^ E[x'i -E(x'i I - E(x'j \xi„] 

iV i rt-i 

+ ±-£ [E(x'i I Xu) - smx'i | Xu) - */]. 
IV1 0-1 

It is easy to see that the method of selection used above implies that the first 
term of vanishes. Furthermore, £, is the arithmetic mean of the conditional 
expectations so that we have reduced the problem to one of determining whether 
the conditional expectations satisfy the conditions for negative correlation or 
expected negative correlation. 

If we denote E {x[ | x ia ) by y, a , then we need to see whether the sets yn • • • , 



344 


WILLIAM G. MADOW 


y iN\ and 2/ji , 


where 


■ , lhni arc oppositely ordered. Now 


(2/w yip)(yja Vjp 


N 1 1V| 

0-1 A-l 


*Ti g%lh CipajS tjhafl 


€joaS — P\%i — .Tio |*Tla] P\%i ' 


If a < jS then, according to the method of selection, 


^ ^ ~d> s t 

0 “J 

while 

N, 

X] GiOtt |3 0 . 
tf-l 

In Theorem 4, we then make the identifications n — N,,m = N } , 

10g — Gigafi , Zji = €jhaf3 Slid “ 1K,0 %jh ■ 

Then 

^gli = (&10 1 ^j.A+l) 

and hence to have negative correlation between the strata, it is sufficient that 
the sets x,i , ■ ■ • , x,n, and »„,•••,*,*, have the type of negative ordering 
represented by S ah < 0. Similarly, if 

ir 0 h = Mj)> /*» = •S^'ib > 

then, for expected negative correlation, it is sufficient that 

o a h ~ VflA+l ~ Cfl+IA + Co+lA+1 5; 0. 

Of course, these conditions will be satisfied if a concave upwards correlogram 
exists. Hence, if a population consists of N random variables xi , • • ■ , Xn having 
a concave upwards correlogram, then, no matter into what strata these elements 
are classified, provided that the order of occurrence of the elements remains un¬ 
altered, the systematic selection of the elements in the sample can be so planned 
as to yield an estimate having smaller variance than the stratified random selec¬ 
tion of the elements in the sample oven if optimum allocation is used. If more 
than one element is being selected from a stratum under optimum allocation, 
then the systematic selection of the same number of elements will suffice. If not 
only optimum allocation but also optimum definitions of strata are being used 
so that but one element is selected from each stratum, then systematic selection 
according to the scheme described in the section will produce a variance not 
larger than the variance of stratified random sampling. It should be noted, how¬ 
ever, that this does not imply that a ‘hammer and tongs’ use of systematic 
sampling ignoring the strata will produce a smaller variance. There is work to 
be done on what is required for the latter to occur. 



SYSTEMATIC SAMPLING 


345 


It may be noted that the procedure of this example provides an answer to the 
systematic selection of elements from a population whose size is not a multiple 
of the size of sample. 


5. The systematic sampling of clusters with probability proportionate to a 
measure of size. It is known [5] that sampling clusters with probability pro¬ 
portionate to a measure of size often yields considerable reductions in the variance 
of the estimates. However, the theory of the systematic selection of several 
clusters Avith probability proportionate to a measure of size has not been worked 
out, and it is the purpose of this section to make some contributions to that 
theory. 

The most frequently used method of sampling clusters with probability pro¬ 
portionate to size is equivalent to the following. Suppose that the clusters are 
denoted by CV , • • ■ , C M and that to the hth of these M clusters is assigned a 
measure of size Ph ■ Form the successive totals Pi, Pi + Pi , Pi -f P 2 + P 3 , • , 
Pi T • ■ ■ + Pm • If we wish to select m of these clusters, we calculate P m = 
(Pi + • ■ + P^)/ m . Then, assuming that P, <JP m , j = 1, • • ■ , M, we select 
an integer with equal probability from 1, ■ ■ , P m , Calling that integer P\ we 
calculate the m numbers P', P' -f P m , P' + 2 P m , • • • , P' + (m — l)P m 
If 

(5.1) Pj + • • • + Pk_! + 1 < P' + (i ~ 1 )P» < Pi + ■ • • + P h 

for any integer i, i = 1, • • , m, then the cluster C' h is selected for the sample. 
Any cluster for which Ph > F m is automatically included in the sample, and if 
there are, say, a such clusteis, then we calculate P m - a for the M — a clusters 
remaining after including these a in the sample, and proceed as above. 

In deriving the variance of the estimate we shall use, we interpret that estimate 
as a stratified sampling estimate Although it is easy to obtain the expected 
value of the estimate without that interpretation, we shall need it later in the 
derivation of the variance, and hence we give it here to shorten the total presenta¬ 
tion a little. 

Suppose that clusters Ci, • •, (A, are such that 

Pi + • • • 4" PA.,—1 < P m fS Pi + ■ • + Pr, . 

Then we define stratum 1 to consist of clusters Ci, ■ • ■ , Ch, . It is easy to see 
that if the above sampling method is used then 


P {Ch is selected from stratum 1, h < /q) = 


Pj C'*, is selected from stratum 1} 


Pm - Pi 




Furthermore, suppose that clusteis , ■ • ■ , Ci. i+ u are such that 
Pi + • • ■ + P/., 1-* 2-1 < 2 P m < Pi + • - ■ + Pki+ki 



346 


WILLIAM G. MADQW 


Then we define stratum 2 to consist of clusters (?*,,•■*, <7*,+*,. It is easy 
see that if the above sampling method is used, then 


P{Ct y is selected from stratum 2) 


Pi + • * • "+■ Pki — Pm 
--, 


P[Ck l +k is selected from stratum 2, 1 < h < ki] 



P[C kl+tt is selected from stratum 2} 


2? m 



‘ ~ Pkj+k,-! 


Since P* < T m we remark that it is impossible that C*, be selected from both 
stratum 1 and stratum 2. 

Injgeneral, if clusters Ck x +...+ki~\ , • • • > are such that 

(5.2) Pi+ +P* l+ . ■•+*■<—1 ^ Pi Pki-i -(.*< 


then the ith stratum consists of these k, + 1 clusters, and we define the probabil¬ 
ities P la , a =■ 0, • • ■ , k,, by the equations 


P* = P(C*,. h . .+*<_, is selected from stratum i) 

_?!+••■+ P*.+.. — (* — l)Tm 

-«- » 


(5.3) 


Pi* — P{C k is selected from stratum i, h + -j- fc,_i < h < h ■+• ■ • • -f- ki} 



a = h - k, - ■ • > — 


Pn, = P{Ct 1+ .. + i, is selected from stratum i} 

_ ~ Pi ~ * • • ~ Pk x + ■ +fc<-l 

K - 


We remark that 


(5-4) Pv-tt,^ + P* = 

X m 

Now, let the elements of the population be xk) , h = 1, • • • , ilf, j « 1, • ■ • , 
2Va, and let the arithmetic mean of the Wh cluster be denoted by £*. Since the 
N\ are usually unknown but the measure of size, Pa , is known, we sample, not 
with probability proportionate to the Nk , but with probability proportionate 
to the P». We shall denote the clusters of the ith stratum by C x , • • ■ , C *,, 
making the identification 

(5-5) C,* = C' a+ A l , + ... +t< _ l . 

Furthermore, the number of elements of the clusters are denoted by » • • • , 



SYSTEMATIC SAMPLING 


347 


N,h ,, and the means of the clusters by , • ■ • , x& t , where 

ta ct^-k j-j* ■ • ■■f.fci ..1 

(5-6) X%a — 

so that Xm — *Ti—and N »o = ,*—j ,^ = 1, *"‘j rn. 

Furthermore, we define 

(5.7) Xta = W,a X ia /P ta ~ 

We define the mean of the ith stratum to be 

*• 

(5.8) X, = ^,P ta X(a/Tm, 

a-0 


and the variance of the ith stratum to be 



Then, if the mean and variance of the population are defined tojbe 

(5.10) 2 = E Ph W? 

h -1 

and 

(5.11) <r 2 = E ^ (5* - if, 

J1-1 r 

it is easy to verify that 

(5.12) 2 = - E f, 

m 1-1 

and 

(5.13) f = - £ <r! H- E (^1 — if- 

m ,-1 to ,-1 

An unbiased estimate of the total of a characteristic. We shall see that we can 
obtain an estimate of x, where 

A/ Ni 

* = E E *i) 

.-1 )-i 


i. e. x is the total of the elements of the population. Since N is unknown, the 
estimate of 2 that is used is the ratio of unbiased estimates of x and N. It is well 
known that this ratio is usually biased. Since we are not making any study of 
ratio estimates here we will not derive the approximation to the variance of this 
estimate. It may be remarked that it can be obtained by a simple extension of 
the results here given. 



348 


WnAlAM G. MA.DOW 


Let us agree that the general form of the estimate will be as follows: 

If the jth cluster of the population is selected we shall subsample n, elements 
from it. The total of the values of the characteristic for these rij elements we de¬ 
note by %'j Furthermore, we denote by n[ the total number of elements sub¬ 
sampled from the ith stratum, or, what is the same, from the cluster selected 
from the ith stratum; and by x" the total of these elements. Tims, if the jth 
cluster is the ith selected, then n\ = n, and x” = x,. We define our estimate x" 
of x, the total of the population, to be 

(5.14) x" = K(x i + ■ • ■ + Z). 

Then, if K = P/mn and n h = nNh/Pu, it is easy to see that x" is an un¬ 
biased estimate of x. 

The variance of the estimate. We may calculate the variance of x" where 

(5.15) x" = P m (xi + • • ■ + x'm) and x" = x'jn. 


Now, by Theorem 2, 


(5.16) 


2 n j J 

cr X " — “T 


M) 


where A* has been defined above We shall not evaluate Eirl*. \a since this in¬ 
volves no new problem for subsampling methods using random or systematic 
methods, or methods using probability proportionate to size. 

From (5.15) it follows that 


(5.17) E*(x"\A) = ? m (5n+ ••• + $L) 


or, m other words, E*(x"\ A ) is the estimate we would have if the dusters in the 
sample were completely enumerated We shall denote the second term of (5.16) 
by <r a . Then, 


(5.18) 


2 

C Tr 


= P 2 


Z «rj; + Z 

—'i •r'l 


Now 

(5.19) 


fet P 

2, V * ~ \2 2 

2—1 ^5 \ftta & i ) — O’ i ■ 

a- a 0 r m 


To calculate ,i^j, we shall use Theorem 1, 

(5.20) aiiij = E(x'i - af,)($J - xj) - E{(&' { - x { )E*[(xi - x,) |*<). 

To calculate E*[(x'j — xf) \ x[] we begin by noting that 

(5.21) E*[(Z - f y ) l sfl s E*[(Z - x,) | (7,1 

where C* is the random event having /c,- -f- 1 possible states which are the selec¬ 
tions of (7,o, ■ • , C,k ( as the sample clusters of the fth stratum. Now if (7; a is 
one of the clusters of the zth stratum let us calculate 

(5 22) «[(*;-?,) 



SYSTEMATIC SAMPLING 


349 


We begin by determining Miich of the clusters of the jth stratum are possible 
sample clusters, if we know that C ia is selected from the ith stratum. Since the 
sizes of strata i and j are both P m it follows that there exist integers /3o and /3i 
such that 

P:0 + • * • + P j0o-l < P,1 + * • ‘ + Pi ,*-1 < Pji 4" * * * + Pjfia > 

and 

PjO + * * * + Ptfi-l < Pa + • • ■ + Pia < Pjl + • • • + P>0! . 

Hence, if we know that C Ia has been selected from stratum i, it follows that we 
must select one of the clusters 

C*, P o+l J * * ' 1 ^>01 

from stratum j and 

P{(7>0 is selected | C ia is selected} = P' 0 /P ta , 0 = ft , ft + 1, • • • , ft 

= 0, otherwise, 

where 

P/0o = Pji + * • • + P>0o — Pa — ■ ■ • , Pi.«-l 
P» = P,e,P = P .+ 1, ••• ,Bi - 1 
P)0! = P»1 + * * • + Pi* — Pjl ~ • ' ‘ — Prft-1 , 

and 

01 

23 P>3 = Pm 

0“0O 

Then 

(5 23) E[{x\ - to | C„) = S„« - 1,] 

where 

(5.27) ®,|« - £ 

0-00 Pi* 

Hence, substituting in (5,20), we see that 

crzlz'j = E(xi — af,)($y|, — X,) 

where 2/i» = $y| a if C xa is selected from stratum i. Then it follows that 
(5-25) 0 *[it = 23 fr (5,« - x ,)(*>[« - Ty). 

a—0 r yn 

Obviously, the conditional expectation can be eliminated fiom (5.25) by using 
(5 23) but no gain in simplicity or generality thus occurs. 



360 


WILLIAM Q. MADOW 


It would be possible to obtain tbe variances and covariances of the x[ by 
listing all possible samples in any'special case. To make this general would only 
require writing the necessary notation. 

Substituting in (5.18) we see that 

= Pm (S 


where is given by (5.25). 

It follows that if we use the fact that 2 ( x * “ x ) = 0, then we have 

i-i 


3 

<T B 


m U. 

i—l o—0 * m 


a—0 


*c)(tCy|a ™' X)f 


or, returning in part to the "unstratified” notation 

(5.26) A = — Z ^ (xh - x) 1 + - £ £ ~ (x.- a - 5)(xyi- - *)■ 

m k-i Jr m .yy a -o .r 

By combining terms of the second part of (4.26) generalizations of the formulae 
obtained in [1] are easily obtained. 

Still another means of writing tr\ is 

(5.27) a\ - — la — + £ £ ^ — *)(*y|«-J)| 

TO ( I *j a-o r ) 

where 


i 

(Tb.«. 




TO i-1 


which shows both sources of changes in efficiency as compared with sampling 
with probability proportionate to size, and replacing the clusters obtained. (It 
is, of course, obvious that PV/rn is the variance of E*(x" | A), if we assume the 
to clusters to have been selected with probability proportionate to size, each 
selected cluster being replaced before the next is selected.) 

By considering (5.26) and (5.27) it is clear that systematic sampling with 
probability proportionate to size will be more efficient than sampling p.p.s. 
with replacement under much the same conditions as when we sample single 
elements. The details are omitted. They depend on applying the Lemma and 
Theorem 4. The summary of the conditions is: If we Bample systematically with 
p.p.s,, and if the two sets x,i, • ■ ■ , x,t { and ya , • • ■ yjk { are monotone, one being 
monotone non-icreasing and the other monotone non-decreasing, then the covari¬ 
ance between the t'th and jth strata will be negative, and thus gains made as 
compared with independent sampling from the strata. 

If we define 

<Ta0 = &(Xia ~ 6>X, a )(Xtf — &Xjfi) 



SYSTEMATIC SAMPLING 


351 


then the concavity condition for systematic sampling p.p.s. to yield a smaller 
variance than independent sampling p.p.s. from each stratum is, if a < 0, 


o o ^ o o ^ 

o al — <f Tl — a al ~ a -t* — 


Tdkf — S 


< 0. 


6. The systematic sampling of clusters of equal sizes. Let us now suppose 
that our population consists of clusters of elements, the clusters being of equal 
size, i.e. containing the same number of elements. To be specific, let the popula¬ 
tion consist of M clusters, where M = cm and each cluster contains N elements, 
where N — kn. Then, the value of the characteristic being measured for the 
ath element of the ith cluster may be denoted by z<„, and the total of all the 
elements of the ith cluster may be denoted by x ,. The arithmetic mean of the 
population is x, and thus 

Mx = £ 

where 


N,xf. =■ x,, 

a. Complete enumeration of clusters in sample. First, suppose that we wish to 
estimate £ by £', where £' is the arithmetic mean of the sample obtained by 
selecting a systematic sample of m of the M clusters, and enumerating all elements 
within each cluster in the sample. Then, we may write 

(6.1) mx' = T) x!. 

t-i 

where x, is the mean of the ith cluster selected for the Sample. From [1], it follows 
then that 

el' = — (1 + (w — l)p„} 

171 

if 

where Mc\ — (£<■ — if, and p e is defined as p* in [1, p. 6], but with i, in 

i-i 

place of x ,. Now from the theory of the random sampling of clusters it follows 
that 

el = £ (1 + (tf - 1) P\ 
where <f is the variance of the population, i. e. 

MN<f = £t (*,/ - xf 

.-i i-i 

and p is the intraclass correlation coefficient of elements within clusters, i. e. 

ff 2 p = (Tj — <Ta/N — 1, 



352 


WILLIAM 0 - MAPOW 


where 

MNal = (x„ - sy. 

.-i /-1 

Thus 

(6.2) cl - -i {1 + (W ~ l)p} {1 + (m - l)pc}. 

mis 

Of the three factors in (6.2), o/mN is the variance of a random sample of size 
mN selected with replacement; 1 + (N — 1)p is the factor arising from the use 
of clusters; and 1 + (m — l)p« is the factor arising from the fact that the clusters 
are sampled systematically. 

b. Stratification and subsampling. When we consider the possibilities of stratifi¬ 
cation and subsampling, the number of possible designs increases tremendously. 
For example, it would be simple to calculate the variances of arithmetic means 
obtained by stratifying the population, selecting sampling units with probability 
proportionate to size, subsampling systematically, again subsampling systemat¬ 
ically and finally subsampling at random. However, such studies may be left 
to be made m connection with the practical problems in which they are to be 
used. Rather than attempt to consider many of the possibilities that might 
arise in practice, we shall here give only the results of the systematic subsampling 
of a systematic sample. The variances of many other designs may easily be ob¬ 
tained by means of Theorems 1 and 2. 

Suppose now that from each of a systematically selected sample of m clusters 
we subsample, systematically, n elements. Then, let our estimate of x be x' 
where, if xL is the ath selected element from the ith sample cluster, then 



From Theorem 2, it follows at once that 
(6.3) cl = -£ {1 + {N - Dp){ 1 + (m - l)p.) {I + (m - DjSil, 

mn 1WI ] mtl 

where is the variance within the ith cluster and pj is the average serial cor¬ 
relation within the ith cluster as defined in [1, p. 6]. It is simple to calculate the 
variance of x' also when the sub-sampling is done by considering the m clusters 
in the sample as one population from which a systematic sample is selected. This 
is the case that occurs when a sample of blocks is selected and all the households 
on the sample blocks are listed serially, a systematic sample then being selected 
from the lists. However, for our present purposes it is the analysis of (6.2) that 
is important and we now turn to a brief discussion of (6 2). 

The most important conclusion to be drawn from (6 2) is that the systematic 



SYSTEMATIC SAMPLING 


353 


selection of clusters even when systematic selection is desirable, may not com¬ 
pensate for the increase m variance caused by the use of clusters. Systematic 
selection will provide the same relative gains but these gains may not be large 
enough to produce the inequality 

{1 + (N — 1 )p] {1 + (m - 1 )p £ ) < 

A problem that we have not worked through is the following: By regarding 
the elements of the population as random variables, we obtain conditions on the 
average correlations among elements of a single cluster as well as on the average 
correlations among elements of different clusters that enable us to state where the 
systematic sampling of clusters of equal sizes may be expected to yield a smaller 
variance than the random or stratified random sampling of clusters or of indi¬ 
vidual elements. This solution should be straight forward. 

c. Systematic sampling in two dimensions. Systematic sampling in two dimen¬ 
sions occurs in such practical problems as the selection of a sample of blocks from 
a city or the selection of a sample of plots from a field 
In selecting blocks from a city, the procedure most often followed effectively 
reduces the problem to one dimensional form by first numbering the blocks of 
the city or a part of it, in serpentine fashion beginning, say, in the upper right 
corner of a map of the city and numbering the blocks in the top row from right 
to left continuing the numbering of the second row from left to right and so on. 
Then a systematic sample of these block numbers, and hence, of the blocks 
themselves is selected. Clearly, this procedure should not be the most efficient 
if neighboring blocks are highly correlated, since, to cite an unrealistic possi¬ 
bility, the possible samples might turn out to be columns of blocks of the city. 

A second two dimensional systematic sampling procedure might be that of 
selecting a systematic sample of the rows and a systematic sample of the columns, 
thus obtaining a grid sample This design too is inefficient when there is a “fer¬ 
tility gradient” along rows or along columns. 

The reason for the inefficiency of both of these procedures can be found by 
examining the formulae for the variances of systematic samples. If the numbering 
is serpentine, then it becomes illogical to expect that the correlogram is concave 
upwards and sharp deviations from that pattern may occur. In the grid design, 
which is a special case of the systematic sampling of clusters with systematic 
subsampling, we may examine (6.3) and note that the intra-class correlation 
coefficient p may be large enough for u\> to be large even when p c is negative. 

Clearly, (6 3) suggests that the possible samples be so defined that p is as 
small as possible In square fields this might be attained by defining the possible 
samples to be plots of a Knut Vik square having the same treatment, and sim¬ 
ilar definitions of possible samples could easily be given for irregular fields. 
This subject is, however, left for further study. 8 


3 One of the referees of this paper has drawn the author’s attention to an article [6], 
the data of which, especially Table 3, are in accordance with the opinions expressed above, 



354 


WILLIAM Q. MADOW 


REFERENCES 

[1] W, G, Madow and L. H. Madow, “On the theory of systematic sampling, I,” Annals 

of Math. Stat., Vol. 15 (1044), pp. 1-24. 

[2] L. H. Madow, “Systematic sampling and its relation to other sampling designs,’’ Am. 

Stat. Asao. Jour., Vol. 41 (1946), pp, 204-217. 

[3] W G. Cochbah, “Relative accuracy of systematic and stratified random samples for a 

certain class of populations,” Annuls of Math, Stat., Vol. 17 (1946), pp. 164-177. 
]4] G. H. Habdt, J. E. Little wood and G. Polya, Inequalities , Cambridge University 
Press, London and New York, 1634, p. 43. 

(6] M- H. Hansen and W. N. Hubwitz, “On the theory of sampling from finite popula¬ 
tions,” AnnaU of Math. Stat., Vol, 14 (1943), pp. 333-362. 

[6] P. G. Hombybb and C. A. Black, “Sampling replioated field experiments on oats for 
yield determinations,” Socl. Set. Soe. Proc., Vol. 11 (1946), pp. 341-344. 



PROBLEMS IN PLANE SAMPLING 


By M. H. Quenouille 

Rothamsied Experimental Station , Harpenden, England 

1. Summary. After consideration of the relative accuracies of systematic and 
stratified random sampling in one dimension the problem of estimation of linear 
sampling error is discussed. 

Methods of sampling an area are proposed, and expressions for the accuracies 
of these methods are derived. These expressions are compared for large samples, 
with special reference to correlation functions which appear to be theoretically 
and practically justified, and systematic sampling is found to be more accurate 
than stratified random sampling in many cases. Methods of estimating sampling 
errors are again considered, and examples given. The paper concludes with 
some remarks on the problem of trend in the population sampled. 


2. Accuracy of systematic and stratified random samples in one dimension. 
W. G. Cochran [1] has given expressions to the variances of the means of samples 
of size n drawn from a population xiXi ■ ■ • x n >. when the method of sampling is 
random (r), stratified random (st) and systematic (sy). He assumes the elements 
x&i x„k to be drawn from a population in which 

E(xi) = p , E(x, — nf = cr 2 , E(x, - p) (x, +u - p) = p*a 


where p u > p« > 0 whenever u < v, and derives the expressions 


( 1 ) 

( 2 ) 

(3) 


'' ~tX 1 l)[' knlk*- 1) S ,M 

• b - EFT) £ (ta - “ >p - + JtfPT) 5 <" “ “ )p 4 


Using these expressions which are linear functions of the p u Cochran compares 
the relative efficiencies of the methods of sampling for several types of correlo- 
gram. It is worth noting that (1), (2) and (3) can be derived under more general 
conditions than Cochran considered. If we assume that (a) each x< is a sample 
from a population with mean pt and variance a\, (b) that p, is distributed about 
mean p with variance o' , (c) that E(p, — p) {pj — p) = p,p, and (d) that 

1 i n— 

p u = i -— 2 Pi.i+u , then it is not difficult to show that (1), (2) and (3) 

Ktl 1/ !■! 


355 



356 


M. tt. QUENOUILTjE 


1 / l\ 1 ^ 71 

require the addition of a superposed variation-( 1 — - \ • 22 on to the right- 

hand side of the equations. Thus it should be remembered that Cochran’s 
results give theoretical maxima to the relative efficiencies of the various methods 
of sampling, while pu is the mean correlation between samples u apart. This 
result is perhaps interesting in connection with sampling for say, insect infesta¬ 
tion, when at each point there will be a mean level of infestation and the sample 
will ho distributed in a Poisson distribution about this mean. Then the superposed 
variation is 


1 

ft 




If we are sampling a continuous process 1 , for n large we can write down the 
integral equivalents of (1), (2) and (3) 


a 

o> 


2 

O' 


n 


(4) 

(6) *'• ~ l [ X 2 l pl,Su + 2 £ Pd “] 

where p u is the mean correlation between successive elements of the sample, u 
apart and d is the mean distance between samples. Wo have thus 



which can often be usod to investigate, quickly and roughly, Avith. the aid of a 
graph the difference between the efficiencies of stratified random and systematic 
sampling. Figure 1 shows how this is done for four types of correlogram. 

For a continuous Markoff scheme, we have p« = p u and 


a 'Tin. 

<r,t ~ — 1 4 - 

ft L 

a 'Ti . 

<r, v ~ - 1 4- 

ft L 


2 

log p d 
2 

log p d 


+ 

+ 


2 

(log P *y 



2 P d ' 

(log p j yj ’ 


which agree with Cochran’s results. 


3. Replication and the estimation or error. Yates [2] has pointed out the 
difficulties attached to the estimation of error for a systematic sample. It will, 
hoAvever, he worthwhile to investigate this point using the above formulae. 

1 la practice we can sample a continuous process only as if it were a discontinuous process 
with k large. 



PROBLEMS IN PLANE SAMPLING 


357 


For'random, stratified random and systematic sampling, if n is large and k is 
regarded as constant, then the variance of the estimate of the mean will be of 
the form <r 2 F(k)/n, where F(k) is virtually independent of n. Thus, if we have 
any method which provides an estimate of error for the samples it will be possible 
to split the series to be sampled into several equal parts (or blocks) to obtain an 
estimate of error of the mean of each part and to combine these to obtain a more 
accurate estimate of the error of the overall mean. In fact, if n is very large, we 
may wish to reduce our number of observations by obtaining estimates of error 
from a random selection of these parts. For stratified random sampling, F(k) is 
completely independent of n, so that we may combine our estimates of error from 
each strata. This leads us to the commonly used method of taking q randomly 
chosen elements per strata, and combining the sets of variances of q — 1 degrees 
of freedom to form an estimate of error. If we make our samples exclusive, 
i.e. no two elements can coincide, then this variance has to be multiplied by 
1 — q/k to give the estimated variance of the sample mean. 

We can in the same way estimate the variance of the mean of a systematic 
sample by using sets of q systematic samples of sufficient length with randomly- 
chosen starting points. This sampling will, however, be more difficult to carry 
out in practice, and we might consider other methods. Our systematic samples 
may be chosen to be invariable in each part or block into which the series is 
split so that our sampling procedure involves, in all, only q systematic samples, 
or we might follow the method advocated by Yates of choosing our q samples 
to be evenly spaced,-so that they are subsamples of a larger systematic sample. 
Whereas this latter method has simplicity and its possible incorporation into a 
more extensive scheme to recommend it, its use has to be very carefully con¬ 
sidered. If we consider the discrete case, we wish to estimate 



but any estimate of variance based on q evenly-spaced systematic samples can 
contain only terms of the form pku/q , and while an estimate of variance based 
on q randomly-chosen systematic samples will obviously be limited, it will, in 
most cases, be more representative. As an example, suppose we take k - 16 
and q = 4 then we can compare the relative occurrences of observing the correla¬ 
tions pi • • ■ pis in the estimate of variance. Six examples of this are given in 
table 1, the random numbers having been drawn from Fisher and Yates tables, 
p u and pie_« being shown together, since they occur equally frequently. The 
table demonstrates how randomly-chosen samples, even as nearly systematic 
as the first two randomly-chosen samples will avoid systematically sampling the 
correlogram. It is obvious that in most cases either method will be fairly good 
but the use of this latter will usually be the more accurate. Comparisons are 
made in table 2 for various types of correlogram using the samples indicated 
in table 1. It is, of course, possible to postulate theoretically many kinds of 


2 Throughout this paper 8 is used for the differential sign to prevent confusion with d. 



368 


M. H. QUENOU1LLE 


correlogram for which the equal-spaced sets of systematic samples will break 
down, but ultimately we must decide with reference to the types of correlogram 

TABLE 1 


Frequency of occurrence of the aerial correlations pi , p a ,.. pn in the estimate of 
variance when 4 systematic samples each with spacing 16 units are taken 



II 

Values of A 2^ p u as estimated by systematic samples 


Pu 

Evenly- 

Bpaced. 

Systematic samples with random 

starting point! 

Mean 

Ex¬ 

pected 

samples 

1 

2 

3 

4 

J 

6 

1 -0.2 u, (u - 1, .6) 

0,17 

0.27 

0 20 

0.17 

0.30 

0.17 

0.13 

0.21 

0.27 

1 -0.1 u, (u - 1, .10) 

0.53 

0.62 

0.58 

0.53 

0.60 

0.63 

0.63 

0.67 

0.60 

2 '“ 

0,04 

0,13 

0.12 


0.15 



0.10 

0.13 

2 -■/* 

0,68 

0.60 

0,04 


0.66 

■ 

13 

0.03 

0.66 

Kendall’B Series 1 

-0,14 

0.03 

0.00 

9m 

0.16 

BBS 


0.01 

0.07 


* Naturally the use of this method of estimating the sampling error assumes that the 
correlation between the corresponding elements in eaoh part or blook into which the Beries 
is split may be neglected, i,e. in this case that the terms pu and above are negligible. In 


v ■ 

this case pu » 1/16 and consequently the term 2(1*5 2 Pu — H 2 pu M ) — 0.66a required in 

14-1 V— l 

II 

(0) differs slightly from the term A 2^ p u =» 0.85 which we are attempting to estimate. 

experienced. We shall consider this point further, after we have dealt with 
two-dimensional sampling. 

.4. Methods of sampling in 2 dimensions. The number of ways in which we 
can sample a two-dimensional space 3 is large, since we can employ random, 


* We shall, in general, consider our two-dimensional space to be rectangular, but it is 
not difficult to draw similar conclusions for an area of any shape. 






































PROBLEMS IN PLANE SAMPLING 


359 


stratified random or systematic sampling in either direction. Thus we will be 
able to consider every possible combination of these methods, e.g. random in 





Fig. 1 . Graphical comparison of the efficiencies of systematic and stratified random 
sampling for various correlation functions, The thick line gives the function 

fi(u) - up u /d, 0 < u < d 

= pu , d ^ w, 

and the dotted line the funotion 


/,(«) — , (* — l)d < u <, id. 

Thus systematic sampling is more or less efficient than stratified random sampling according 
to whether the area under the thick line is greater or less than the area under the dotted 
line. The most efficient method is indicated on each graph. 


one direction and systematic in another will be denoted by r-$y. Furthermore 
we can consider the sets of samples in one direction to be aligned with one 
another, or to be independently determined. The suffix 1 will be used to denote 










360 


M. H. QUENOTJILM1 



aligned samples while suffix 0 will denote independent samples, e.g. we might 
sample according to the system r^ja , Examples of several methods of sampling 
ate given in Figure 2. 












PROBLEMS IN PLANE SAMPLING 


361 


6. Accuracy of sampling in two dimensions. Suppose we consider a sample 
of nini elements drawn from the elements x t] (i = 1, 2, - ■ * nih ,j — 1, 2, • • • n<Jc ), 
(which form a single finite population drawn from an infinite hypothetical 
population), such that the mean spacing in the two directions is h and fc 2 . 
These parameters will, if necessary, be indicated in brackets after the method of 
sampling, e g. nsyoinih , n 2 k 2 ). 

Let X denote the mean of a sample formed by the method considered, and 
x' a member of this sample Suppose, also, that the x l} are drawn from a popula¬ 
tion in which 

E(x t) ) = p, E(xi, — pf = a, 

E(x tJ — p) (£,+«.,+„ — p) = p,jM<r 2 , 

Further we may average p,/ M over all possible values of i and j to define p uv = 
(£>_,_# by the relation 

Ptjuu ~ (&i| u |)(fe 2 n 2 — j v j)puv* 


The purpose of these definitions is to allow to eliminate the difficulties associated 
with the parameters of finite populations by considering this population as 
being itself a sample from an infinite population. Cochran employs a similar 
device. 

5a. Random sampling. It is not difficult to see that 

a\X) = i E(X 2 - Xif = E(X, - m) 2 - E(Xy - p) ( X , - p), 


where Xi and X 2 are independent samples. 
Also 


E(X, - p){X 2 - p) = E{x[ - p)(x 2 - p) 


fci fc 2 th 


1 + 


1 


fci m 712 


Z Z (klUi - [ U I )(fc 2 7l2 ~ 



where the double summation 4 exists over the region S given by | u | < kiUi, 
| v | < h 2 n 2 and excludes u = v = 0. We thus have to evaluate E(X± — p) 2 for 
the different types of random sampling. 

It is easily shown that 
2 


E(X i - pf = 


n\n 2 



_ Hi n 2 — 1 

h i h 2 ni n 2 {kik 2 riin 2 


1) 


Z Z - I u \)(k 2 n 2 — I v I )p„ 


2 r 

= — l + 

niTiz L 


Til 1 


kik 2 ni(kxk 2 nin 2 — 1 ) 

+ 


for r„r v , 

Z Z (*1»1 ~ \ U [)feW2 - |p [W 

2(712 ~ 1 ) ^ „ v 1 

k 2 n 2 (k 2 n 2 - 1) S (&2na V)P0I, J 


4 In general, unless otheiwise stated, double summations will exist over the region for 
which the coefficients are positive, excluding u = v = 0 



362 


M. H. QUENOUILLE 


a 

Thrii 


+ 


for r s r 0 , 

[‘ + 2 - i“iH^ - m>- 

**~ht (fc "> - " )r - + h3s5^u£*- - “>-] 

for nn, 


hntihni 


whence 


(7) 


2(r ° ro) njns ( J fcifcj)* 

.fi_1_ 

L h ki niTi^ki k 2 Ui rh 


Z Z (k«i - | w |)(fcjnj - | v |)p,] 


■ (r, r B ) = — (l - c 2 f 1 — ,-r-r--~ 

tii n 2 \ /cife/ L («i «2 — l)fci /( 


fc 2 — 1 


kitiithikihriini ~ 1 ) 


( 8 ) 


•Z Z (fci«-i - I U D (h n } - I V Dp, 

?»-**] 

W0= —(l-r^VF 1 

nin2\ «ife/ L 


&i ki(ni + 7ij — 1) — 1 


(9) 


•Z Z (hni - | u |)(Mi - | v Dp, + 


(fci ki — l)Ai ki m rh{ki kinin* — 1) 
2ki(nt — 1) 


(kiki — 1 )rh(kith — 1 ) 

• £ (hni - v)p„ + - ; -ye £ (Mi - ,)J. 

»-i («j«2 — l)ni(fcini — 1) u-i J 

5b. Stratified random sampling. We can deduce the variances for some methods 
of taking stratified samples if x[ , the mean of the elements sampled in the ith 
stratum, is independent of x\ , since we will then have 

E(X - £? = B(ii - *<)*/», 

where x is the mean of the finite population which is sampled. Hence 
v 2 (slcr 0 ) = — <r 2 {r 0 r 0 (l, ki ; njh)} 

7l\ 


(IP) =— 

Jii m \ ki ki) L 


hkzniikikane — 1 ) 

•Z Z (h - | u \)(fani - | v |)p,] f 



PROBLEMS IN PLANE SAMPLING 


363 


a(shr 0 ) = — (l -—V s [ 1 - --_ 

niTh\ fa fa/ |_ fafant(fafa — 1) 

(H) *E E ( fc i “ I w |)(**n* - I v |)p„, 

JL 2fa (n, - 1) „ N 1 

+ (fct fe - I)® n* - 1 ) ,tl (fci "* * )p - J ’ 

(12) ^ ^ »i«a( fa fa) ° ^ fafa(fafa — 1) 

•EE ft - |«|)ft - M)*,,]. 

To estimate the variance of other methods of sampling, we will make use of a 
general formula which we might have used to derive the expressions (8)—(12). 

If X{ is any element of the sample X, then 

(x - ^ = ~r E ft' - *) 2 - E (*: - x) 2 ] 


whence 

c(X) = FAX - xf 


= — [e ft - *) J _ e (*; - M) « 

rii ri2 L ninj ^ 

+ r^r E E ft ~ f»)ft - m)'"1 , 

j 


_ fafa^ 1 ^ — i _ _ 

ki fa rii roi L fci 

• (hrii - I v |)p u „J - - 


fanintifafamm — 1) 


E E ft«i - I u |) 


nlth — 1 2 . nxn 2 — 1 


B(:cl - a)ft - a) 


ni m 0 fa fa) * [/ fcx fa ni rw(fa fa — 1) ^ ^ ^ Wl ^ U ^ 

• <fe«. -1. Oh. + -»Efa! - >)(«,' - »)i. 

Thus, provided that we can estimate ZJft — p) (a,' — p)/cr 2 the expression (13) 
gives the error for all methods of sampling. 

As an example, we might deduce the expression (12). If we choose any member 
x'i , then a second member x'/ will be located at random with respect to x\ except 
that there will be fafa — 1 positions in the same stratum as %[ that zj will not be 
able to occupy, Thus the expected correlation E(x[ — fa) ft ~ p)/v* will be 
given by 

(14) S E (*i"i - |« Dft«, - | « |)p„ 

_ klkKniih - 1) ^ 2 (fa - I « Dft - I v Dp-• 



364 


M. II. QUENOUILIiE 


If we substitute (14) into (13), we will obtain expression (12) for the variance of 
s^sfo . In the same manner, we can derive for stisti the expression 

E(x< - - fi ) 


kikiiniiti — 1) l_7ci Ti'a«i« 2 


11 


E E (.hni — [ u \)(hni — | y |)p u 


1 

7cj 7c 2 Tii 


El 


(7-i Jh ~ | u |) (tj - | v |)p,,» 


(15) 


fa &2 ^2 


E 1] (/Cl - ! 14 |)(/i'2«j ~ 


u |)puu "b ^ ^ EZ Ej (7<-’i | w |) 


2(7ci fe t m - 1) 

hniihni — 1) ~i 11 ftl W p 

__ 1) yb /. _ , 2(fcl - 1) *yt 2 

7ci(7ci - 1) l l liJpM + fe m(/c 2 n 3 - 1) & 


(fe — | y |)p uv + 


2(fcl ft 2 


Ou 

(7c 2 n 2 — y)po„ 


2(fc] 7r 2 — 1) 

'**(&» - iT 


(fa • — y)p,.» 


Thus we can evaluate <r 2 (X) for all types of stratified random sampling 
5c. Systematic sampling. In a similar manner to that used for stratified random 
sampling, we can use (13) to evaluate the variances of systematic sampling. 
Values of E{x[ — p) (x'j — p) for three of the possible methods of sampling are 
given below. For sy^yi 


(16) E{x[ - y)(x[ - p) 


n 1 n 2 (?iin 2 - 1) ^ ^ ,!l I w 1)0*2 I v 


For syiro 
(17) 


E{x[ M p) - k i nin2(nin2 _!)EE O'i - I m I) 


A2 n 2 


• (fa n a | V IW.* ,4 w# ( ni ^ i !)(/. 2 „; 2 _ D E ^ 


For sj/osj/o 


^ ^ M) ~ fa/ca(«x« 2 - 1) [fa fam « 2 ^ E (fa*« - I « I) 

(fans — | v |)p«i, — E E (A - i Hi — | u |)(/c 2 — | v |)p™ 


1 

7ci kz Tl2 


E E (fa | m D(fa n 2 ~ | y |)p„„ + , . £ (fa — | w |) 

A'j /t 2 

■ (fa - |y |)p™ + r~ £ E Oh - | l)(/c 2 - | y \)p hl u,v 

/i2 '^1 


(18) 



PROBLEMS IN PLANE SAMPLING 


365 


— -t - 1 2 (&a — p)p<i« + r~ 2 2 (fo ■— I w DC 7 ^ — I v 

a> 2 f** 1 ! /Cl ^2 

- s t ». - «) p ..1 . 

fcl u-l J 

The derivation of (18) may be compared with that of (15). 

6. Effect of alignment. We can examine the effect of alignment either by 
an examination of the values of the variance of different samples, or by the 
direct use of (13). For random and stratified random sampling, the effect of 
alignment is to increase the variance of the sample by an amount 

SS Uuu(poi, Pnv) d~ 22 f?uu(puo Pud) where &uv dl 0, 

b u * > 0. 

This will be positive for monotonic decreasing correlation functions, and for the 
majority of functions realised in practice. Thus alignment will usually increase 
the variance for random and stratified random samples. 

For systematic samples, the position is more complicated, but, roughly, the 
variance is increased by an amount 

22 duv(.Pkiu,kiv ~ PfciUifcju), 

where a uv > 0 and pn,u,* 2 * is a mean over a rectangle, centre pk^,k iv for u and v 
non-zero, and is a mean over a line, length h centre po, for u zero, (and similarly 
for v zero). Whether this is positive or negative will depend on the correlation 
function, and it will have to be investigated for the types of correlation function 
which are encountered. 

7. Limiting forms. For a continuous process, when rii and n 2 are large, we 
may, in the same manner as for linear sampling, obtain integral approximations 
to the sampling variance, provided that 22 pd^u,d^ converges. 

We thus have 


(19) 

c 2 (ro To) = o- 2 (s£ 0 r 0 ) ~ <r s /ninj, 


(20) 

ff2(riro) ~^[ 1 + |j[ P0 ’ 5y l 


(21) 

0.2 r 2 r® 2 r® 

ff(fir i) ~ 1 4- T / PovSv+t puttdu 

7ll ?l 2 [_ Cl 2 Jo Ct\ 

]. 

(22) 

r ”> ~ £i t 1 - id, L LI, _ 1 * |)p - 8u+ 

11 «”*''] 

(23) 

(T 2 r l r dt r dl 

v 2 (sfo sf 0 ) ~ ~~ 1 — / / (di — I u |)(ds — | 

Ti% L Cti C 12 J—dj J—d± 

v |)p ot , 5u 



366 


M. H. QUEN0U1LLE 


(24) 


~ ^ [i - L. L (di “ 1 "*“ ,s 

- dh: C L ( * - 1 ' |W(u s ’ + L L ( * ~ 1 “ !) 

2 f* 2 f** 1 

* (da — | w |)p u « 5w St> + j Puo Su ~ ^ J 0 (^ “ M )puo Su 

+ il povSv ~kl (*-•)«•*»]» 

^ r if" r 

ff 2 (s2/i To) ~ - 1 — 7~J / / P«« S U Sv 

0 Wl L “1 fl 2 J-« J-« 

+ i Jl I- PdlU> ’ Sv ~ d* l p0 * sv ] ’ 

(26) Asytm) ~ “ [,£. j£ ^ f M [ a *.«] > 

^(sj/osj/o) ~ “ [i “ j[„ [ di ^~\ u \^ SuSv 

-a/:/>-m)p. 


(25) 




(27) 


+ f f (di — | w ])(d> — | V |)pixv 5"M5 i> 

d\ &2 J— dj J—di 

l « rdt 1 r d t 

+ ji Zj / (da — | v 5v j> / (da I ^ |)po»5r 

Oa M—oo *-dj ^2 

l * r di 1 r dl ”1 

+ Z/ / (di — | u |)p«,d,rSu — "j if (di — | w |)puo5w I • 

d\ gw 00 J— di t*l *—J 

8. Particular case where p ov = p u p.. We note that, if p u , = p»pJ most of 
these forms can be simplified greatly. If we write 

2 r® 00 

1 — T / PlifiM + 2 Z) prf|Ui 

Oi Jo u-1 


SJ/u 


2 /•‘h 

si* = 1 — li I (di — u)p u Su, 
di Jo 

with similar forms for ay, and at ,, and, also 

2 r m 2 H 1 

/i = d J pvSv, fi ~ j (d* v’)p,dv f 

2 r* f 2 r^ 1 

^ = d[ Jo PuSU ’ ^ = dl Jo ^ ~ u ) puSu ’ 


f i — 2 P<»ii> i 
•—1 

QO 

f" = 2 z: 


u-1 


PdlU : 


5 A sufficient condition for this to be a valid autocorrelation function is that both p„ 
and p, should be autocorrelation functions. 



PROBLEMS IN PLANE SAMPLING 


367 


then we have, for 

example, 


(28) 

<r*(r 1 r 0 ) 

2 

<7 

- 

7li7l2 

(29) 

<r*(nn) 

2 

<7 

/-W . - ■■ 

711^2 

(30) 

o- s (sfo«fo) 

2 

(7 

/V ■ 


(31) 

<r*(fi£i s£i) 

2 

<7 

-- 

rt\ ni 

(32) 

v (si/isyi) 

2 

<7 

nsj - 

riiTh 

(33) 

tr\sy a sy a ) 

2 

<7 

/V - 

riiTh 

From these we get 


<r*(sfi 8<i) — 
(34) 

a (syisyi) 

2 

<7 

- 

Til Tls 



[(Stu St v 

(36) * W) ‘ 

- a 2 (stosU) 

2 

ni7i2 


(1 + /l + ft) i 


(stu sU +f'isy u + ft sy,). 


i(sL st v - sy u sy,) + fi(st u - sy u ) + / 2 (si„ - sy v )], 

2 

tr 

rsj - 

Ylilh 

[|(1 - sy u )(l - sy v ) - (1 - st u )(l - sU)} +fisy u + f"sy t ], 

2 

(36) ff (a<oS<o) - c(syam) -— [/i(sk - sy u ) + fast* - «!/»)]• 

Wins 


The forms (34), (35) and (36) enable us to compare the variances of the samples 
in two dimensions by using the one-dimensional results. For most practical 
cases, we know that the /’s are positive, st u > sy u and st v > sy v , so that 

(37) a(sti8ti) > /{syisyi) > cr 2 (siosfo) > a(sy a sy 0 ). 

The values of cr 2 (sioS<o)/'cr 2 (r 0 r 0 ), ^(syisyi)/ 1 /(^ 0 ), <r J (sj/o%)/<r 8 (roro) and 
<r\8ksU)/a{syti8yo) for pd lU = pl u| and pa lV ~ pi' 1 are given in table 3. It is not 
difficult to show that for a given number of samples, (di, ck fixed), a(st 0 sk), 
<r 2 (*2/i8j/i) and ^(sj/osj/o) are least when pi = p 2 . The expressions tabulated have a 
value of 1 for pi = p 2 = 0 and tend to limiting values of 0, 2/3, 0, and 2 respec¬ 
tively as pi and p 2 tend to 1. It is interesting to note that for pi and p 2 differing 
by more than 0.4 the grid imposed by syisyi is less efficient than purely random 
sampling. The type of function p m = p u p/ is, however, less likely to be realised 


* For a town survey, we might find the correlation between two points’depending on a 
within-streets and a between-streets correlation, so that this function could be realised. 



TABLE 3 

Comparison of the efficiencies of systematic and random sampling for various values of pj and p 2 


Pi 

0 

0.1 

02 

0.3 

0 4 

0.5 

0 6 

0.7 

os 

09 

10 


HBff! 

IB 

HQ 

BR15 : 



jfjfSB 


ms 

ijjWS 

1.000 



1.222 


1.857 

2.333 



5.667 



00 


Mlxi 

1.000 

lifB 

IWTh 


ffgB 

IS® 




1.000 


B 



|gM9 






1.000 

1 000 



0.720 

0.669 

0.632 


0 576 


0.529 



0.471 

0 1 


0 730 

0 754 

0.827 

0.956 

1.160 

1.488 

2.055 

3 215 

ipif m 

00 



0.596 

0.634 

SsIEEm 


0 437 

ElgiEi 

0.308 



0.364 



1.21 

1.25 

1,28 

B 

1.31 

1.32 

1.33 

1.33 

1.33 

1.33 





0,665 


0.497 





0 375 

0.2 



EjRiij 

0.721 

0.788 


1.134 

1.532 

2 362 

4.911 

00 





0.416 

IW«il 


0.328 

mm 

KilEEfll 

0.272 

0.267 




1.32 

1.36 

1.39 

1 41 

1 43 

1,44 

1.46 

1.46 

1.46 





m 

0.476 

0,441 


m 

0.354 

0.328 

0.305 

0.3 






0.778 

0 924 

B 

1.825 

3.751 

00 





EH 


0.207 

0.271 

Eh 



0.190 





1 41 

1.45 

1.49 

1.51 

1.53 

1 54 

1.55 

1.55 






0.432 

EPS 




0.272 

0.247 

0 4 





0.680 

EB 

0.787 


1.437 

lUgio] 

oo 






0.288 

EH 

0.229 


0.185 

iiRm 

0.161 






1.30 

1.54 

1.57 

1.60 


1.63 

1.64 







0 354 


0.284 

0.253 

0.223 

0.100 

0.6 






0.075 

0.703 

0.821 


2 228 

oo 







0.223 


0.171 

HE* 


0.116 







1.59 

1.63 

1.66 

1.68 

1 70 

1.71 








Hg 

0.243 

0.210 


0,151 








Hil§1 

0.712 

0.908 

1.679 

CO 








mid 

0.142 

0.121 










1.67 

1.71 

1.74 

1.70 

1 78 









0.206 

0.172 

0.139 

0.109 

0.7 








IS 

0.742 

1.226 

OO 









0 118 

0.096 

0.077 

0.059 









1.75 

1.79 

1.82 

1.84 










wmm 

MMR! 

0.070 










0.667 

0.803 

00 












0.037 










1.84 

1 87 

1 89 

0.9 










0.067 
0 067 












0.035 

RE 






1 





1.92 

1.95 


368 








































































PROBLEMS IN PLANE SAMPLING 


369 


in practice than a centrally-symmetric function, which is independent of the 
choice of axes. For this reason, we consider next this latter type of function. 


9. Centrally-symmetric correlation functions. Dedebant and Wehrte [3] 
have given a necessary and sufficient condition for p(w, v ) to be a correlation 
function as 


/*“ r“ 

(38) p(u, v) = I / cos (uu — p v )SF(u, p), 

J—00 J— oo 

or alternatively, 

(39) /(w, p) = ^y 2 J j * cos (cow - M»)p(w, v) Su Sv. 


For a centrally-symmetric correlation function we can put w = r cos 8, v = r sin 6 
then p(w, v ) = p(r) and 


f(u, p) = —— / / cos (r Vco a -t- m 2 COS 8i)p(r)r dd i dr, 

oo Jo 

where 6i = 6 + tan -1 (p/co), 

^ _ 

~ 2tt J *^°( rT )pW rdr > where r = Vw 2 + m 3 - 


Thus, if p(u, v) is centrally-systematic, then so is /(w, p) and conversely, so that 
we get 

(40) /(t) = ^ J o Jo(rr)p(r)rSr, 
and 

(41) p(r) = 2ir f Jo(rr)/(r)r5T. 


We can thus find suitable forms for p(r) and /(r). In this connection the formula 

f” _ 5" 

/ /o(j/z)e < “'<5y = l/(a 2 + z 2 ) 1/2 , a > 0, is useful, since we can see that rz(e~ av /y) 
Jq oQ, 

5 n 

and — (a 2 + z a ) -1/2 are possible functions for 27 t/(t) and p(r) although our choice 

must be limited by the stochastic nature of p(r) as well as by its convergence. 
Thus, for example, a = n = 0 gives 1/2x7- and 1/r as spectral and correlation 
functions, but these will not converge. 

In the linear case, the Markoff process p(w) = e _ ““ had a spectral function 
/(r) = l/ir(a + r 2 ) which is a Cauchy distribution in one dimension. If we take a 
two-dimensional Cauchy distribution 7 as our spectral function we get /(r) = 


7 In the same way as the ordinary Cauchy distribution can be considered as a density 
distribution on a line produced by a point source at a distance a, radiating in all directions, 
so can a two-dimensional distribution be considered as a density distribution on a plane 
from a source at distance o. 



370 


M. H. QUENOTJILLE 


af 25r(a* + t ) 1 ' 1 and p(r) = — — (e~“ r /r) = e~ ar . Thus it appears that a generalised 

oa 

Cauchy distribution will be the spectral function for a generalised Markoff 
process. 

We can, of course, consider an ’’elliptical” Markoff process given by* 

. [V 2muv tH 1/a 

(42) p(u, v) = exp 

but, in what follows, to simplify the computation, m will be taken as zero, so 
that by changing the units in which di and da are measured, we will work with a 
process p(r) = <f ar . 


TABLE 4 


Comparison of observed serial correlations with theoretical values obtained from a 
centrally-symmetric correlation function 



Rows 

Columns 

North-east 

South-east 

in miles 

Ob- 

Calou- 

Ob- 

Calou- 

Ob- 

Calou- 

Ob- 

Calou- 


served 

lated 

served 

lated 

served 

lated 

served 

lated 

1 

0.332 

0.368 

0.310 

0.368 

— 

— 

— 

— 

2 

— 

— 

— 

■— 

0.264 

0.243 

0.264 

0.243 

2 

0.149 

0.135 

0.090 

0.135 

— 

— 

— 

— 

2V2 

— 

— 

— 

— 

0.050 

0.059 

0.129 

0.059 

3 

0.009 

0.050 

-0.029 

0.050 

— 

— 

— 

— 

3V2 

— 

— 

— 

— 

-0.050 

0.018 

0.070 

0.018 

4 

0.034 

0.018 

-0.041 

0.018 

— 

— 

— 

— 

4V5 

— 

— 

— 

— 

-0.020 

0.004 

0.060 

0.004 


This process does not seem to be far removed from the type of correlation 
function experienced in agricultural field work.’ Osborne [4] has mentioned 
the possible use of p u = e _Xu . MahalanobiB [5] has calculated correlations for a 
paddy field of 800 cells; his values are shown in table 4, together with values of 
the function e~ r . Bearing in mind that the standard error of each of Mahalanobis’ 
values is approximately 0.035, the fit is seen to be quite good, although an 
elliptical process with axes running south-east and north-east would undoubtedly 
fit the observations better. 


andp u, 


exp 


it,p 

w « < 


- 

u V 

1 

1 

a 6 

f 


,r will be called the circular Markoff process, while put => pl^pl* 1 
will be known as degenerate Markoff processes of the first and 


second orders. 

1 This is further supported by the fact that using a function of this kind it is possible to 
obtain numerically a law in substantial agreement with Fairfield-Smith’s law over a wide 
range of values. 




































PROBLEMS IN PLANE SAMPLING 


371 


10. The relative efficiencies of systematic and stratified random sampling. 
Ideally the correlation functions developed in the last section should be used 
in the expression (19)-(27), but these functions are not capable of easy integra¬ 
tion. An alternative approach can be made if we note that 


(/(shsta) — a 


(43) 


j r( 1 ~ l iI) F(v ’ di)Sv 

Q2 J-dj \ 02 / 


+ 


where 


F(u, da) — j-1 j -j- puv^v -j- f puv^v dj Pu.iijt j , 

02 L J 0 02 Jdl »-i J 

F(v, dl) = -j J j puvfiU "4“ f puvSu ■ di ^ pd ± u,v I > 

di L J o di Jd, «-i J 


It is seen that F(u, d 2 ) and F(v, di) are extensions of the expressions obtained for 
(ctji — <r\ v )/cr\ in section 2. Hence, if F(u, d 2 ) and F(v, di) are both positive 
functions, systematic sampling is more accurate than stratified random sampling. 
A particular case of this occurs when p uv = pi pi. However when p u „ = exp 
{— (u 2 + „ s ) 1/2 }, F(u, d 2 ) is not always positive, since, as u increases, p u „ becomes a 
convex function of v. This complicates the interpretation of (43) greatly since it 
appears that as u varies from 0 to di, F(u, d 2 ) varies from -f- to an unknown 
value X. This value will be positive if d 2 > > di and negative if d x > > d 2 so 
that if the sampling is disproportionate in the two directions systematic sampling 
will be more efficient than stratified random sampling. Furthermore, if di = d 2 ~d 
and d —> 0, F(u, d) —> oo and systematic sampling again appears to be more 
efficient. Thus in a wide variety of cases this type of systematic sampling i.e. 
synsyo gives a more accurate result than random sampling. 


11. Estimation of sampling errors. An examination of formulas (7)—(18) 
shows that the principles used for the estimation of linear errors can be used in 
plane sampling. If we consider that each sample can be broken up into inde¬ 
pendent units each of which is situated in one of s strata, then for q replications 
we will have qr — a degrees of freedom for error. For example, ror 0 , r 0 n , akr 0 
and ston will have qnin 2 — 1, qm — 1, qnin 2 — ni and qn 2 — 1 degrees of freedom 
respectively, so that a single sample will contain an unbiased estimate of error, 
but stoats , siosfi , shsti , sy a sy a and sy x syi will have nm^iq — 1), n 2 (q — 1), q — 1 
and g—l degrees of freedom and will require replication to form a valid estimate 
of error. We can however use the method of splitting our sample into several 
parts each of which will give a fairly accurate estimate of error. We may, again, 
consider the possibility of using a set of systematic samples, which are evenly 
spaced, to estimate the sampling error, and we will see that the exclusion of the 
p’b of lower order may lead to appreciable bias unless the correlation between 



372 


M. H. QTJBNOXIIIjLE 


successive terms of the sample is small, but, as Yates has pointed out, this 
method will provide an upper limit for our sampling error. These methods of 
sampling are illustrated by the examples given below. 

12. Examples. We shall consider the three methods of estimating the sampling 
errors of a systematic sample: 

(1) using sets of systematic samples randomly placed with respect to each 
other, i.e. the material to be sampled is broken up into a series of sub-areas 
or blocks and several systematic samples are taken in each block; the 
error variance is calculated from the variances of the systematic samples 
in each block, 

(2) using one set of systematic samples randomly placed, i.e. several sys¬ 
tematic samples are taken and the area is then broken up into sub-areas 
or blocks; the error variance is calculated from the variances of the 
portions of the systematic samples in each block, 

(3) using one systematic sample i.e. one systematic sample is taken which is 
broken into several systematic samples of wider spacing, e.g. four samples 
at four times the original spacing, the area is then divided into several 
sub-areas and the error variance is calculated from the variances of the 
portions of the sub-systematic samples in each block. 

These three methods are increasingly accurate in their estimation of the 
mean, increasingly biased in their estimation of the sampling variance, and 
decreasingly difficult in their practical application, so that our method of sam¬ 
pling may vary according to the population and according to the use to which the 
results are to be put. It is, for example, conceivable that subsequent sampling 
will yield an improved estimate of error so that initially only a rough guide 
may be required. 

a. If we are sampling from a continuous linear population with a large number 
of observations in each part into which we split our series, methods (1) and (2) 
will both give accurate estimates of the variance per term 

* ~ \ l Pu5u + 2 ^ p*»y 

Method (3) will, however, estimate a 2 instead of the correct variance per term, 
which is 

* 0 ~ t 1 

Thus the estimates of sampling variance by method (3) will in general be higher 
than the estimates by methods (1) and (2), although the actual variance will be 
lower. 

b. Kendall [6, 7] has constructed 480 terms of an artificial series u „+1 = 
1.1 Un+i — 0.5 u„ + <„+j where the e n are rectangularly distributed from —49 
to 49. For this series cr s = 2379.81 and s ! = 2535.11. The series was split in six 
parts of 80 terms, for each of which n = 5, k = 16, 5 = 4, so that 18 degrees of 
freedom were available for error. The results for this sampling configuration are 



PROBLEMS IN PLANE SAMPLING 


373 


given in table 5. The values in this table corroborate the conclusions for large 
samples of continuous populations. 

c. A number of uniformity trials were taken and sampled according to the 
systems shah and syisyi. For sampling according to the system s hsti the error 

TABLE 5 


Comparison of three methods of estimating the sampling error of systematic samples 

for an autoregressive scheme 


Method 

Estimate of sampling 
variance per term, s a , 
based on 18 degrees of 
freedom 

E (s a ) 

True sampling 
variance per term 

(1) 

3228 

■K 

2170 

(2) 

1872 

1 

2167 

(3) 

3709 

■ 

423 


TABLE 6 


Comparison of efficiencies of different methods of sampling on three uniformity trials 


Source 

No In Cochran’s 
(til Catalogue . .. 

Crop . . 

No, of Plots, 

Mean ... . 

Variance per term 

Katamkar (81 

72 

Potatoes 

576 

23 262 

15 555 

Wiebe [9] 

132 

Wheat 

1440 

587 95 

10,018 0* 

Wynne Saves and KarBhna, 
Iyez 110] 

108 

Sugar cano 

960 

270.89 

1794 42 

Type of sampling . 

sti Stl 

eyi syi 

syi syi 

stl sti 

syi syi 

syi By, 

sti sti 

syi syi 

syi syi 

Proportion sam- 










pled. 

1/6 

1/6 

1/6 

1/9 

1/9 

1/9 

1/8 

1/8 

1/8 

Method of estimat- 










ing error. . 


(2) 

(3) 


(2) 

(3) 


(2) 

(3) 

No. of partitions . 

1 

4 

4 

1 

4 

4 

1 

5 

5 

ni . . 

3 

3 

6 

4 

2 

4 

4 

2 

4 

hi, . . 

2 

2 

1 

3 

0 

3 

2 

4 

2 


16 

2 

4 

20 

5 

10 

16 

3 

6 

hi . 

6 

12 

0 

6 

6 1 

3 

8 

8 

4 

q . 

2 

4 

1 

2 

4 

1 

2 

4 

1 

Mean. . , 

23.140 

23 435 

23 323 

586.64 

598,66 

275.29 

276.29 

260.72 

271.27 

Estimated variance 










per term . . 

9.783 

2 689 

4.889 

6161,6 

5772.7 

7038 5 

1320 16 1 

799.29 1 

1269.54 

Degrees of freedom 










of estimated var- 










iance. 

48 

12 

12 

80 

12 

12 

00 

15 

15 


* Baaed on the original 1500 plots. 


was estimated by taking two samples per strata, while, for sampling according 
to the system syisyi , the error was estimated by comparing sets of four samples 
in each part of the series by methods (2) and (3). The results of this sampling are 
shown m table 6. While the number of trials is small, the trend to be seen in the 
results agrees very well with the conclusions reached above. 











374 


M. H. QUENOUILLE 


13. Trend in the population. Frequently in taking samples from a population, 
we are faced with the problem of a trend. This will not greatly affect random and 
stratified random samples as estimates of the population mean, but the efficiency 
of systematic samples will be affected to a large extent. If we consider linear 
sampling, and denote by S t the sample whose first element is a:,- then the set of 
samples S, will usually be monotonic with i and the difference between Si and 
Sk will be large (roughly equal to %\ — Xk). 

Yates [1] has suggested a method to overcome this difficulty; by letting S t 
represent 


n 


- X{ + Xi+k + ■ • ■ + 3.'+(71-2)1 + -—^—- Xji+(n-l)*l ) 


the difference between systematic samples due to trend is largely removed. 
It is easily seen that this necessitates a small loss of information, and in particular, 
for a continuous random population the variance is (n — £)v 2 /(n — l) a instead 
of a/n. For plane samples, the corresponding adjusted sample will be 


S<i = 


(tti - 1)(H2 - 1) 


" V 
_fafa 


x <i + jr £**,./ + 


+ 


Ufa — j) 

kl /c 2 


®*+0n— 


+ ^ x t,m t + xt+h,m, + ^ 


+ 


i{h - j) „ 
fa fa 


+ 


+ 


(fa - j)(k1 - j) 
fa fa 


•Tn-Cni—1 ,/+(n» 


- 1 )*, 


with a similar loss of information. 

Trend is, however, most likely to be appreciable in large samples, and in this 
case, the loss of information due to end adjustments is negligible, so that the 
conclusions reached above will remain unaltered. 

The author wishes to thank Dr. F. Yates and Professor M. S. Bartlett for 
advice in the preparation of this paper. 


REFERENCES 

[1] W. G. Cochran, “The relative aocuracy of systematic and stratified raDdom samples 

for a certain class of populations,” Annals of Math. Slat., Vol. 17 (1948), p. 164. 

[2] F. Yates, “A review of reoent statistical developements in sampling and sampling 

surveys,” Roy. Slat. Soc. Jour., Vol. 109 (1946), p. 12. 

[3] G. Dhdebant and P. Wehrtb, “Meoanique aliatoir,” Portugaliae Physics, Vol. 1 

(1945). 

[4] J, G. Osborne, “Sampling errors of systematic and random surveys of cover-type 

areas,” Am. Stat. Asso. Jour., Vol. 37 (1942), p. 266. 

[6] P. C. Mahalanobis, “On large-scale sample surveys,” Roy. Soc. Phil. Trans., B. 231 

(1944), p. 329. 

[8] M G. Kendall, “On the analysis of oscillatory time-series,” Roy. Stat. Soc Jour., 
Vol. 108 (1946), p. 93 

[7] M. G. Kendall, Contributions to the Study of Oscillatory Time-Series , Nat. Inst. Econ. 

Soc. Res., 1946. 



PROBLEMS IN PLANE SAMPLING 


375 


[8] R. J. Kalamkar, “Experimental errors and the field-plot technique with potatoes,” 

Jour. Agr. Sci., 1932, p. 373. 

[9] G. A. Wiebe, “Variation and correlation in grain-yield among 1500 wheat nursery 

plots,” Jour. Agr. Res., 1935. 

[10] Wynne Sayer and P. V. Krishna Iyer, “On some of the factors that influence the 

error of field experiments with special reference to sugarcane,” Ind. Jour Agr 
Sci., 1930, p. 917. 

[11] W. G. Cochran, “Catalogue of uniformity trial data,” Roy. Stat. Soc. Suppl. Jour., 

Vol. 4 (1937), p. 233. 

[12] P. Yates, “Systematic sampling,” Roy. Soc. Phil. Trans., Vol. 241 (1948), p. 346. 



' REPRESENTATION OF PROBABILITY DISTRIBUTIONS BY 
CHARLIER SERIES* 

By R. P. Boas, Jr. 

Brown University 

Summary. The paper describes some results concerning the representation 
of a function by linear combinations of the successive differences of the Poisson 
distribution, not necessarily the partial sums of the type B series of Charlier. 

1. Introduction. For various purposes it is often desired to expand a probability 
distribution fix) in a series 

00 

(i) /(*) ~ 2 c k e t {x), 

k- 0 

where the 8 k (x) are a given set of standard functions. Arguments of a heuristic 
nature led Charlier [4, 5, 6] to suggest that it ■would be useful to take the 6 k (x) 
in (I) to be either the successive derivatives or the successive differences of some 
fixed function; the two cases are often referred to as type A series and type B 
series, respectively. Charlier gave formulas for determining the coefficients in the 
two cases, but the question of whether the formal series represents the given 
function in any reasonable sense has to be investigated separately for each 
particular choice of the function generating the series. Only one special case of 
each type has been much used: for the A-series, 6 a (z) is the normal density 
function (2ir) _, e _il1 ; for the B-series, B 0 (x) is the Poisson function e~V/a; I (when x 
is restricted to take only nonnegative integral values). We shall refer only to 
these special cases when we speak of A- and B-series in this paper. 

There are two distinct problems (which have, however, often been confused) 
connected with the representation of a function fix) by a series (1); for con¬ 
venience, we shall refer to them in this paper as the practical problem and the 
theoretical problem. In the practical problem, we have an empirical function/(x), 
defined only for a finite number of values of x, which we suspect is representable 
by co6o{x) together with a small correction, so that we hope that a few (say three 
or four) terms of (1) may give a good representation of fix) in a relatively simple 
analytical form with a reasonable amount of computational labor. In some cases, 
and certainly with the classical A- and B-series which we are considering, we 
could represent, as closely as desired, any fix) (however irregular) which takes 
nonzero values at only a finite number of points; but there is no interest in doing 
thiB if the process involves finding too many terms of the series. (Neglect of this 
fact ha3 led to ill-founded statements by mathematicians about the satisfactory 
nature of the A- or B-series; but see [27, pp. 38-39].) 

Thus it would be of interest to know, if possible, under what circumstances a 
given empirical density can be represented fairly well by a few terms of a series 
of a given land. If no simple criterion can be given, it is desirable to have a means 

* Address delivered by invitation at the meeting of the Institute at Boulder, Colorado, 
on September 1,1949. 


376 



REPRESENTATIONS OF DISTRIBUTIONS 


377 


of computing coefficients which will make a few terms of (1) give the best possible 
fit—best possible being defined in a way appropriate for the problem at hand. 

In the theoretical problem, fix) is a function defined for all values of x, or at least 
for all of an infinite set of equally spaced values of x, ariaing from theoretical 
considerations which suggest Co$ 0 (x) as a reasonable first approximation to f(x). 
For example, the central limit theorem states that under certain conditions the 
cumulative distribution function of the sum of a large number of independent 
random variables is approximately normal; then we might expect that this 
distribution function would be representable by a series (1) with 8 a (x) the normal 
distribution function. For such theoretical purposes we should like to have 
criteria for the representability of a sufficiently general fix) by a series (1), 
where representability is of course to be interpreted appropriately, as ordinary 
convergence, uniform convergence, convergence in mean square, asymptotic 
representation, etc., according to the requirements of the problem at hand. The 
larger the class of fix) for which we can prove a representation theorem, the 
larger is the possible domain of applicability of the series to theoretical problems. 

2. The A-series. This paper is concerned with the B-series, but for comparison 
we first mention some properties of the A-series. In the case of the classical 
A-series, we have the attractive fact that the functions 9 n (x ) are orthogonal 
with weight function e^, that is, 

| 6 n (x)d m (x)e ixt dx — 0, m 9 * n. 

J— oo 

In fact, e lxi 0„(x) is, except for a numerical factor, the nth Hermite polynomial. 
This orthogonality property enables one to compute the coefficients in a series (1) 
with great ease from 

(2) n! c„ = f f(x)9 n (x)e ixl dx, 

J—co 

or since 9 n (x)e ixt is a polynomial, from the moments of fix). By the classical 
theory of orthogonal functions, this means that if the c„ are so computed, and we 
take N + 1 terms of the series, we minimize 

(3) f e ixl [f(x) - F n (x)] 2 dx 

J—oo 

for all possible sums 

N 

(4) F,v(x) = X 6 n (x). 

n“ 0 

The convergence theory of Hermite series has been thoroughly investigated by 
mathematicians, so that it would appear that in theoretical problems, in which 
fix) is given for all values of x, we are in a position to find out everything about 
the representation of fix) by an A-series. Also in problems of practical curve- 



378 


R. P, BOAS, JR. 


fitting, the fact that the closest approximation to /(x) (in the sense (3)) by sums 
of the form (4) is given by choosing the coefficients according to (2) seems to 
leave no more to be said. 

However, the formal elegance of the A-series seems to be somewhat misleading. 
Even when a series converges it by no means follows that its Nth partial sum is 
the best selection of N terms for representing a given function. Even though the 
partial sums do give the best fit in the sense of (3), it may not be desirable to 
measure the closeness of approximation by (3); some other measure of approxi¬ 
mation may be better suited to the end in view. For example, it is known that 
the partial sums of Edgeworth’s series (see [8]), which is a rearrangement of the 
4-series, are more satisfactory for some purposes than the partial sums of the 
A-series with the coefficients determined by (2). More precisely, Edgeworth’s 
series furnishes an asymptotic expansion, with a remainder term whose order of 
magnitude can be estimated quite precisely, in circumstances where the series of 
orthogonal functions does not do this. Again, for practical purposes a few terms 
of the A-series sometimes exhibit undesirable properties (such as negative 
frequencies). If /(x) is a function defined only for integral values of x, A. Fisher 
[10] has suggested and applied the idea of minimizing, not (3), but the sum 
|/(*) — F„(x)| 2 in order to determine the coefficients of the approximating 

sums. 

3. The B-serles. We can now sec how the Btatus of the ^-series resembles or 
differs from that of the A-series. Here we deal principally with a function defined 
for integral values of x; 8 a (x) = 6(x) = e _x X*/*l, A0(x) — B(x) — 6{x — 1), 
A*0(x) = A(A*~ J 0(x)) and 0j,(x) = A k 0(x); 0(x) is taken to be 0 for negative 
integral x. We shall refer to this as the discrete case of the H-series. The liter¬ 
ature of the subject contains a number of rather painful attempts to put the co¬ 
efficients mto usable form, persisting even after the simple formula 

(5) c» = (1/nl) £(j l )(-D , * n ~'w 

had been obtained, where is the nth factorial moment, 

= £ f{k)k\/{k — n)l. 

Formula (5) can be derived, for example, by using orthogonality properties of 
the 0 r (x). We have, in fact, that 9 n (x)d m (x)/do(x) is 0 or n! X~” according as 
n m or n = m. 

The parameter X in the H-series is at our disposal, and can for example be 
chosen m such a way as to improve the convergence of the series. For purposes of 
practical curve-fitting, it has been customary to choose X equal to the mean of 
the distribution/(x), a choice which makes the coefficient ci of A9 equal to zero. 
Charlier also suggested other methods in which C\ and Cj, or ci, ca and c$ are 
zero [7], Such choices, of course, may reduce the amount of computation needed 



REPRESENTATIONS OP DISTRIBUTIONS 


379 


to make use of a given number of differences in fitting a curve; aside from this 
consideration their use seems to depend on the belief that one improves the 
convergence of a series by adjusting any available parameters so that as many as 
possible of the initial terms of the series are zero. This belief does not always 
seem to be confirmed by the facts. (In particular, compare columns 2 and 5 of 
Table 1, columns 2 and 4 of Table 2, or columns 2 and 4 of Table 3.) 

The theoretical problem of what fix) can be represented by convergent 
5-series has been studied by several authors [12,13,17,19, 20, 21, 23, 24, 26, 28]; 
the study by Schmidt [24; see also 25 and 17] gives necessary and sufficient 
conditions for the representation in the case of a nonnegative fix), so that, at 
least in all cases of interest in statistics, the theoretical problem seems to be 
completely solved. However, one of the purposes of the present paper is to 
reopen this apparently closed problem. 

There is also a continuous version of the B-series, which is suggested by the 
fact that 

(6) 6(x) = (2tt)-V x f e~' xu exp (\e'“) du 

reduces to the Poisson function e~\’/x ! for positive integral x (and to 0 for 
negative integral a). This form of the 5-series has not been much used, and its 
use is subject to suspicion since it has rather peculiar properties. In particular, 
it cannot represent, in any reasonable sense, a positive function fix) or one which 
is too small as x —» «> [26, 3]; since the functions which present themselves for 
representation in practice are both positive and small at infinity, the continuous 
case of the 5-series looks unpromising for applications. (See also [27a], la.) How¬ 
ever, it has been applied [15]. 

The purpose of this paper is to describe some results on the 5-series which 
have been obtained in a mathematical paper [3], devoted to what we have 
called the theoretical problem; some contributions to the practical problem 
will also be given in the present paper. The starting point of this investigation 
was the question of what happens if one tries to approximate a function, not 
by the partial sums of the series (1), but by some other combination of the 
first N functions 8 nix), when approximation is taken in the sense of (unweighted) 
least-squares. This method of approximation seems well adapted to statistical 
problems, and leads to simpler mathematical work than ordinary point-by-point 
convergence of the partial sums. The 5-series itself gives a least squares approxi¬ 
mation with a weight function l/8a(x). We consider here only the classical 5-series, 
when 0o(x) = 8{x) — e~\ x /x\, 9 n (x) = A n 6 0 (x); the main results are substantially 
the same for rather more general cases [3; see also 14, 25] In addition, here we 
consider only nonnegative fix), assumed zero for negative x. Functions which 
need not be zero for negative x are handled easily by generalizing the 5-series 
to the form [3] 

fix) ~ X) b n V n 8(x) + 12 a*A” d(x), 

-so 0 


( 7 ) 



380 


R. P, BOAS, JR. 


where V denotes the advancing difference: V9(x) = 6(x) — 9(x + 1); there 
seems to be no particular reason (other than a historical one) for preferring one 
kind of difference to the other. The generalized series (7) might be useful for 
graduating symmetrical probability distributions, although it does not seem to 
have been considered in the literature (cf. [la]). 

4. Results: practical problem. Our question takes somewhat different forms 
in the two cases which we have described as the practical and the theoretical. 
In the former, we aslc what the coefficients should be so that 

(8) t fix) - ± a^A k 6{x) ' 

z—G fc—0 

shall be a minimum, where /( x) is an empirically given function and AMs a given 
integer, in general not very large. If N is 0, 1 or 2, that is, if we use 1, 2 or 3 
terms, the best choice of the af° in (8) can be calculated without difficulty. 

For N = 0, our question is that of finding the best least-squares fit to f{x) 
by a Poisson distribution ai 0) e~\ x /x\; the best choice of ao 0) is then 

(9) of = jVg/(s)X7zlj / 
where 

Jo(iy) = 1 + 3 / 7 ( 2 !)* + 3 / 7 ( 40 * + ••• 

(Jo denotes the Bessel function of order 0); on the other hand, the usual formula 
(5) gives the different coefficient 

QO 

Cq = HO = £ /(&). 

z“*0 

This, of course, is simpler than (9) to compute, although its use is based on the 
uncritical assumption that the first term of the series (1) is the best one to take 
if only one term is to be used. Charlier [7; see also 10, pp. 101-103] suggested a 
different formula in which one uses, not A k d(x) } but A k 6(px + q), the parameters 
p, q, X being adjusted to make the terms of (I) in A6, A 2 d, A s 0 all zero; here d(x) 
is defined when x is not an integer by interpreting e~\ x /x ! as e~\ x /V(x + 1), 
and not by using formula (6). Table 2 shows that in at least one numerical case 
(9) gives a better least-squares fit than Charlier’s method (and without intro¬ 
ducing gamma functions to take care of d(x) for fractional x). However, it is 
not excluded that Charlier’s method will give better results in other cases, 
since with the change of the functions 9„(x) the results of this paper cease to 
apply. 

For N = 1, we get the best least-squares approximation to f(x) by 

a^Oix) + ai 1) A6(x) 



REPRESENTATIONS OE DISTRIBUTIONS 


381 


__ 

a o X) = —j—3 (Zo + Zi)» 

do) a . + f . 

where Zo = Z£-o fix)dix), Si = S^-o f(x)d(x — 1), a = Jo(2iK), p = 
—i/j (2iX) , the J’b again denoting Bessel functions. For N = 2, the corresponding 
formulas involve also y = —J t (2iK) and Sa = Et-o f(x)6(x — 2). They are: 

-sx n ( 2 ) _ P — a y I 2 p - a — y ip (3 —a S 

2/3 2 — a 2 — ay 2/3 a — a 2 — 0:7 2/3* — a 3 — ay 

-ax ( 2 ) _ Py — aP -\-2pi 2 — 2ay ip a + 7 — 2g y 
1 (a — y)(2p i — a* — ay) ° 2/3 2 — a 1 — ay 1 


2a - 2tf + Py ~ <*P 
2/3® — a* — ay 


-~ 2 X ( 2 ) __ ay - P _ y p y' 

(a — y)(2/3 a — a a — ay) 0 2/3* — a a — ay 1 

. P 2 - a® 


._ p ~ « __ y 

(a — y)(2|3 a — a 2 — ay) 2 ' 


The functions i n J n (iy ) are real for real y, and extensive tables are available [32]. 

Some numerical examples showing the comparison between graduation by 
these formulas and by the corresponding number of terms of the 15-series are 
given in Tables 1-3. It will be noticed that (as the theory indicates) one gets a 
better least-squares fit by formulas (9), (10) or (11) than by a corresponding 
number of terms of the JS-series using the coefficients (5). However, one may not 
get a better fit if goodness of fit is measured in some other way, e.g. by % 2 - 
Unfortunately the coefficients calculated by this method increase rapidly in 
complexity as the number of terms increases, and even the coefficients for N = 3 
would involve very heavy algebra. Since numerical examples [2] indicate that it 
is often necessary to go to terms in A*6 for a satisfactory fit, it might be worth 
while to calculate the next few coefficients. 


6. Results: theoretical problem. In the case of a theoretical distribution we 
ask how coefficients should be determined so that 

(12) Z fix) - Z A k 6(x) 

x—0 fc—0 

will tend to 0 as N —* “. The convergence to 0 of (12) is a rather strong kind of 
convergence, since it implies convergence of the approximating sums to fix), 
not only for each x, but even uniformly for all x. Of course, the “best” choice of 



382 


R. P, BOAS, JR. 


at^ as above would be expected to give convergence under the weakest hypothe¬ 
ses, but because of the complexity of these coefficients it seems desirable to 
make (12) only approximately a minimum; this actually makes no difference 
in the limit, although the approximation is not usually satisfactory for small 
values of N. To see the connection between the formulas used here and the 
“classical” formula (5) for the coefficients in (1), we note that (5) can be written 

(13) ^ tm £ [i J 

n I *—o 02 " 


(5) results if we expand the derivative by Leibniz’s rule and rearrange the sum. 
If we expand e~ u in a power series before differentiating in (13), we obtain 


«. = <- D-tm t (‘V<-«'-'/!! - a- D- ±( l \t 

k—0 Z—max(*,n) \Wy l —0 \l rCJ 1 


If now we break this series off at n = N to obtain 


(i4) 

we obtain a sequence of approximations to f(x) by sums al N) A k d(x) which 
has, in general, much better convergence properties than the partial sums of the 
B-series with coefficients a n given by (5). In particular, if f(x) = 0 for x = — 1, 
—2, • • • , this sequence of approximations converges to/(x) whenever X)“-o | fix) | 2 
converges; on the other hand, for nonnegative f(x) it is known [24] that the 
B-series converges if and only if lim*-,® f(x)2?x k = 0 for k = 0,1, 2, • • • , a much 
more restrictive condition. If we demand that the partial sums of the B-series 
converge in mean square, that is, that (12) tends to zero with aj, N) independent of 
N, we have the even more restrictive condition [3] that lim sup*-.® \f(x) j 1 " g J. 

The approximating sums with coefficients (14) have the additional property 
that they reproduce f(x) exactly for x = 0, 1 , 2, • • • , N. One would expect 
that in general they would then tend to deviate rather widely from f{x) for 
larger x, and so would not be satisfactory for practical curve-fitting. However, 
it seems possible that if we fit such a sum not to fix), but to f(px 4- q), with 
suitable integers p and q, thus making the approximation agree with fix) at a 
set of values covering the whole range of definition of fix'), it might give a satis¬ 
factory fit elsewhere. This possibility has not been investigated; a similar 
approach using the partial sums of the B-series was suggested by Charlier [7] 
and Fisher [10]. 


6. The continuous case of the B-series. In the continuous case we again ask, 
not when 


fix) = £ a„A"0(x) 


n*"0 


(15) 



REPRESENTATIONS OF DISTRIBUTIONS 


383 


with uniform convergence in every finite interval, but when 
(16) f{x) = l.i.m. 2 A"0(a:), 

n-0 


which means that 


(17) 


lim f f(x) — £ ai* 1 A n 6(x) 


dx = 0. 


For (16) the following negative results are known [26]: if f(x) ^ 0, (15) cannot 
converge uniformly on every finite interval (unless f(x) = 0); the series, if 
convergent imiformly on every finite interval, cannot converge to f(x) unless 
the Fourier transform of f(x) vanishes outside (— v, tr), a condition which 


TABLE 1 


Number of petals on buttercups. X =■ .631 




2 

3 

4 

5 

6 


l 

Calculated 

Calculated 

Calculated 

Calculated 

Calculated 

X 

Observed 

3 terms 

1 term 

2 terms 

3 terms 

3 terms 


frequency 

(formula 

5) 

(formula 

9) 

(formula 

10) 

(formula 

n) 

(formula 

14) 

5 

133 

134.9 

119.9 


132.9 

133.0 

6 

55 

51,6 

75.6 


55.3 

55.0 

7 

23 

22.5 

22.5 


22.1 

23.0 

8 

7 

9.6 

5.0 


8.5 

9.1 

9 

2 

2.9 

0.8 

0.0 

2.4 

2.6 

10 

2 

0.6 

0.1 

0.0 

0.5 

0.5 

Total. 

222 

222.0 


207.7 

221.7 

223.2 


automatically excludes any f(x) which vanishes for all large | x | or even is too 
small as x —» ®. Nevertheless, Jdrgensen [15] applies the continuous case success¬ 
fully to practical problems, A possible explanation of this apparent discrepancy 
is that if the ai N) in (16) are properly determined, (16) will be true under fairly 
general conditions. To be sure, the mean square difference in (17) cannot be 
made arbitrarily small unless the Fourier transform g(x) off{x) vanishes outside 
(—it, it), but if | f{x) j a is integrable the difference can be made small if g(x) is 
itself small. If g(x) does vanish outside (— x, x), then (16) is true; and in fact 
the coefficients al N) can be taken the same as in (14), so that the approximating 
sums depend only on the values of f(x) for integral values of x; these values are 
known to determine/(a:) under our hypotheses on g(x). 

7. Discussion of some numerical results. Table 1. Column 2 gives the fit by 
two terms of the 5-serieB (really three, since the coefficient of A 6 is zero when 
















384 


R. P. BOAB, JR. 


formula (5) is used), as calculated by Charlier [7] (that is, using terms through 
A 2 6). Column 3 gives the best least-squares fit by a single term, i.e., a Poisson 
distribution, calculated by formula (9); it is clear that this term alone does not 
represent the observations very well. Column 4 gives the best least-squares 
fit by terms through Ad. Column 5 gives the best least-squareB fit by terms 
through A 2 0; the improvement over Charlier’s fit by the same number of terms 
is evident by inspection. Column 6 gives, for comparison, the same number 
of terms calculated by formula (14), which gives an approximation to the best 
least-squares fit and necessarily reproduces the data exactly for the first three 

TABLE 2 


Failure of grains of barley. X => 2.757 


X 

1 

Observed 

frequency 

2 

Calculated 

4 terms 
(Charlier) 

3 

Calculated 

1 term 
(Formula 9) 

4 

Calculated 

2 termB 
(Formula 10) 

5 

Calculated 

3 terms 
(Formula 11) 

0 


MB 

47.3 

49.9 

48.4 

1 



130.4 

134.7 

133.4 

2 

180 

174 

179.8 

181.6 

182.3 

3 

170 

151 

165.3 

163.2 

164.3 

4 

111 

111 

113.9 


109.8 

5 

50 

60 

62.7 

69.3 

58.1 

6 

22 

32 

28.8 

26.5 

25.2 

7 

22 

14 

11.4 

10.2 

9.3 

8 

7 

6 

3.9 

3,4 

2.9 

9 

2 

2 

1.1 

1.0 

0.8 

10 

1 

0 

0.3 

0.2 

0.2 

Total. 

749 

752 

744.9 

740.0 

734.7 


values of x. The fact that (14) gives good results here is presumably connected 
with the small size of X. 

Table 2. Column 2 gives the values calculated by Charlier [7] for a fit after 
the linear transformation x —► px -j- g, with X, p and q chosen to make the terms 
in A 9, A 2 0, A 6 all zero (the values were read to the nearest integer from Charlier’s 
graph). Column 3 gives the best least-squares single-term fit calculated by 
formula (9); this is a considerable improvement for x ^ 6, but for the remainder 
of the table it is rather poor. Column 4 gives the best least-squares fit by two 
terms; column 5, that by three. The x 2 -test indicates that the graduation is 
rather poor in all cases. 

Table 3. Column 2 gives the classical calculation with terms through A 2 0; 
this was given by A. Fisher [10] and (more accurately) by Aroian [2], Columns 3 












REPRESENTATIONS OE DISTRIBUTIONS 


385 


and 4 give the best least-squares approximations by two and three terms; 
column 4 is better than column 2, in this sense, as expected. However, column 4 
is a poorer fit when tested by x, chiefly because of the poor fit at a; = 0. It should 
be noted that two more terms of the 5-series give a more satisfactory fit [2]. 

TABLE 3 


a-partides from a bar of polonium. X = 3.87155 


X 

1 

Observed 

frequency 

2 

Calculated 

3 terms 
(formula 5) 

3 

Calculated 

2 terms 
(formula 10) 

4 

Calculated 

3 terms 
(formula 11) 

0 

57 

49.5 

51.3 

45.2 

1 

203 

201.3 

213.3 

190.9 

2 

383 

403.4 

399.0 

393.5 

3 

525 

532.3 

524.8 

529.8 

4 

532 

520.6 

517.2 

525.4 

5 

408 

402.6 

407.7 

409.7 

6 

273 

254.8 

267.7 

261.9 

7 

139 

137.1 

150.6 

141.1 

8 

45 

64.0 

74.1 

65.3 

9 

27 

26.1 

32.4 

26.3 

10 

10 

9.4 

12.8 

9.3 

11 

4 

3.0 

4.6 

2.9 

12 

0 

0.9 

1.5 

0.8 

13 

1 

0.2 

0.5 

0.2 

14 

1 

0.0 

0.1 

0.0 

Total. 

2608 

2605.2 

2657.6 

2602.3 


X> = 10.2 

n = 7 

X s = 16.2 
n = 8 

X a = 11.4 
n = 7 


8. Proofs: theoretical problem. We now outline the proofs of the results which 
we have stated. They depend on the fact that the numbers 6{x) ( x - 0, ±1, 
±2, • ■ •) (where d(x ) = 0 when a; is a negative integer) are the Fourier coefficients 
of the function <p(u) = e -x exp (Xe <u ), i.e. 

0(x) = (2*)" 1 [\(u)e~' xu du, x = 0, ±1, ±2, ••• 


A k e(x) = (2*)- 1 £ <p(u)( 1 - e' u ) k e-' xu du. 


Furthermore, 












386 


R. P. BOAS, JR. 


If we then assume the condition E-« | fix) | 2 < with fix) = 0 for x = 
— 1, —2, ■ • • , the numbers fix) are the Fourier coefficients of a function g(x) 
of integrable square, by the Riesz-Fischer theorem from the theory of Fourier 
series [31, p. 74]: 

fix) = (2ir)~ 1 f g(u)e ,xu du, X = 0, ± 1, =b 2, ■ • •, 


Thus 


(18) f(x) - E a[ N) A k 6(x) 


1-0 


= (2or)" 1 £ e _,IU [p(u) - <p{u) E ai N) (1 - e*")‘] du, 

and so the expressions on the left appear as the Fourier coefficients of the expres¬ 
sions in square brackets on the right. By Parseval’s theorem for Fourier series 
[31, p. 76], then, we have 


(19) E \fix) - E al N) A k d(x) 

h-0 


= (2^r 1 f g(u) — <p(u) E aru — e‘“) 

J-T 1—0 


du. 


Thus we have reduced the problem of minimizing the mean-square difference 
on the left of (19) to that of minimizing the integral on the right of (19). By 
rearranging the sum in the integrand, we see that an equivalent problem is to 
minimize 

(20) D = (27T)" 1 f I g(u) - <p(u) E cj, m e' ku ' du, 

| /C<— 0 

where the cjt W and ai N) are readily expressed in terms of each other; in fact, 

(2D _ = (-D fe E£)cr ) . 

Since | ip(u) | = e ~ x+Xoo8U e _a > o, we can write D in the form 



D - (2»)-> £ 

g{v)/y{v) - 

■ E d* e' ku 

k-Q 

2 

[ ^(r) | 2 

so that 






1 


2 




E 4 N) e iku 

du ^ 2tD 


(22) 

r \ 


,T 1 

V 



■*> 

= e-“ g(u)Mu) - E^ 

/.-o 


du, 



REPRESENTATI0N8 OF DISTRIBUTIONS 


387 


since e a ^ | tp(u) ] g 1. Thus we can make D arbitrarily small if and only if 
we can make 


(23) 


D* = (2*)- 1 £ - E 4 m e' ku 


E 

*-0 




arbitrarily small. Now the Fourier coefEcients of g(u) are /(x); those of l/<p(u) 
are e x (—\) x /xl for x ^ 0, 0 for x < 0; by the convolution theorem for Fourier 
coefEcients [31, p. 90] the nth Fourier coefficient of g(u)/ip(u) is 


(24) 


E/(» - k)e\~ \) k /k\, n = 0, 1, 2, • • • , 

h- 0 


and zero for n < 0. Furthermore, it is well known from the theory of Fourier 
series that D* is a minimum if c* W) are chosen as the first IV + 1 Fourier coeffi¬ 
cients of g(u)/ip(u), and that this minimum is arbitrarily small for large enough N 
if and only if the Fourier coefEcients of g(u)/<p(u) are zero for negative indices 
—which is in fact the case. If we then take the values (24) for cj, N) , ft = 0,1, • • • , 
N, and express aj, N) in terms of cl N) by (21), we arrive at the formula (14). 

It will be observed that the minimum D is connected with the minimum D* by 

mm D ^ max | <p(u) | • min D* ^ mm D* Ss — D g e sx m j n j) 

mm <p(u) 


so that all that we can say about the approximation given by (14) with a small N 
is that it is an upper bound for the best possible mean-Bquare approximation by 
sums (18), and that the best mean-square approximation is at worst e -2X times 
it This means that if D* is small, so is D; but D* is not necessarily small even if 
D is. Hence we cannot in general expect the coefficients (14) to be suitable for 
practical curve-fitting, since they may increase the mean-square error by a 
factor of as much as e 2 *, we may, however, expect (14) to be better when X 
is small. 

Now, as we have already observed, 

/(*) - E 4 70 A k 6(x) 

k-0 


is the xth Fourier coefficient of 

9(u) - <p(u)tai N) (1 - e‘) k ■ 

fc -0 

if we write (18) in the form 

(25) f{x) - tal N) A k d(x) = f f^lgiOMQ - t 4 W) (1 - e l ) k ]<p(t) dt , 
k- 0 J-r L *-0 J 

and choose the ai N) as specified above, the expression in square brackets is 
g(t)/<p(t) minus the first N + 1 terms of its Fourier series, and so the Fourier 



388 


E, P. BOAS, JR. 


series of [• • •] involves no e kt with k < N + 1. Since the Fourier series of p(t) 
involves no e ikt with k < 0, the product - ■■] also involves no e' u with 
k < N + 1, and therefore the integral in (25) is zero for x = 0, 1, 2, ■ ■ , JV 
(since it represents the xth Fourier coefficient of <p(0[ - ' ■])■ In other words, 

f(x) - £ al* A k 0(x) = 0, x — 0, 1, 2, • ■ •, N 
1—0 

Furthermore, we can compute/(x) — Sit-o af w) A A, 5(x) for x > N by the convolu¬ 
tion formula from the Fourier series of <p(t) and [■■■]; for n > N, the nth Fourier 
coefficient of [■ is just that of g(t)/<p(t), given by (24), and that of <p(t) is 
e\ n /n\ so for x > N 

/(a) - t 4 N) ^8 (x) = i, (Zf(l - k)e\~ \) k /kl) 0(x - l) 

li-0 1-AT+l \l-0 / 

and in particular 

N *+i 

f(N + 1) - £ al N) A k e(N + 1) = £ f(N + 1 ~ fc)(- \) k /kl. 

k-0 k -0 


9. Proofs: practical problem- We have so far obtained only an estimate for 
the minimum of D, by obtaining the minimum of D *; this estimate is satisfactory 
for large N and so for theoretical purposes. However, to obtain precisely the 
best mean-square approximation to f(x) by a small number N of terms of the 
sum in (18), we have to choose so that 

t al N) (1 - e“) k ^t) 

k-0 


is the first N + l terms of the expansion of g(t) in terms of the set of functions 
obtained by replacing (1 — e lt ) k p(t), k = 0, 1, 2, - ■ • , by an equivalent ortho¬ 
normal set. The process for obtaining this orthonormal set is well known, it 
turns out that the integrals that have to be evaluated are expressible in terms of 
Bessel functions of imaginary argument; the result is that the first orbhonormal 
functions are 

h(t) ~ (2ir)' _i a: _i exp (\e lt ), 

* w -<* rr, 6sr^S"" p<w ’' 


hit) = (2 t)~ 


>ai — aoO!2 «i(ao — 02)5“ — (ai — 


altt 


[(a? — ao)(ao — os) ( 2 au — ao — 0:00:2 )] 1 


*r- ex P &A 


where ao = */o(2i\), a. = —iJ i(2iX), a 2 = — J 2 (2i\). It is then a simple matter, 
first to express h,h, h in terms of *>(<), <p{t)il - e u ), <p(l)(l - e' l f, and then 
to determine ao 0) ; aj 11 , o} u ; and oo 2) , Oi (2> , af\ For example, the best two-term 



REPRESENTATIONS OP DISTRIBUTIONS 


389 


approximation for g{u) in terms of i/'o(w), 'piiu) is 

g{u) ^ fo(w) [ g(u)\f/ 0 (u) du + \piiu) f g(u)$ x {u) du, 

J— T J— r 

and the integrals J g(u)$i(u) du are combinations of terms of the form 

(27T)- 1 f g(u)e' ku <p(u) du; 

v—jr 

these in turn are Fourier coefficients of g(u)(p(u) and so are expressible, by the 
Parseval formula, as products of the Fourier coefficients of g(u) (namely, fin)) 
and of <p(u) (namely, din)) We omit the algebraic work; the results are given in 
formulas (9), (10), (11). 


10. Proofs: continuous case. In the continuous case of our approximation 
problem we assume that | fix) | 2 is integrable on (— «>, to) and look for coeffi¬ 
cients Oi W) that will minimize 


D = f fix) — a k V> & k 8(x) dx, 

J-T 1—0 


where 


6(x) = (2ir) -1 f <p(u)e-'* u 


du, 


A k 6(x) = (27T)- 1 r v {u)e~' xu i 1 - e' u ) k du. 

T 


Let f{x) be the Fourier transform of g(u ); we can regard 8(x) as the Fourier 
transform of <p(u), <p(u) being defined as zero outside (— r, ir). Then by Parseval's 
theorem for Fourier transforms we have 


2 tD = f | git) | 2 dt + [ git) - vit) - 

J|i|>T J-ir 1—0 


i <l ) k 


dt. 


Clearly, then, D cannot be made arbitrarily small unless git) = 0 almost every¬ 
where outside (—ir, 7r); and if this condition is satisfied, D reduces to the same 
form which it had in the discrete case—see (19). Thus the problem of mean-square 
approximation in the continuous case reduces, if it can be solved at all, to the 
corresponding problem in the discrete case. 


11. Representation by a series. We consider the representation of a given 
fix) by the B-series with the classical coefficients (5), but with mean-square 
convergence of the series. Here we assume that fix) ^ 0, fix) = 0 for x = 
— 1, —2, • • • , and f fix)] 2 < =°, ask whether we can have 

|2 


(26) 


Inn X) 

n— 


fix) - J2 a k A h eix) 

7.-0 


= o, 



390 


E. P. BOAS, JR. 


where here the a k do not depend on n (but are not, in principle, required to have 
the form (5)). From our previous discussion this is equivalent to 

lim f I g(t) — <p{l) E ctfc(l — e' l ) k dt = 0, 

and this implies that 

lim | a, | a fV(»ni ~ = 0. 

n—*oo J—t 

From this it follows easily that 

E a„(l - e‘Y 

n”0 

converges for 1 1 1 < ir, or in other words that 

H(z) = E a.(l - z) n 

n-D 

converges on | z | = 1 except perhaps for 2 => —1, and hence converges in 
| 1 — z | < 2 By analytic continuation it is easy to identify H(z) with F(z)$(z), 
where for | z \ < 1, 

Hz) = E /(«)*" , 9(») = E 8(n)z n = e x<1 ~' ) . 

n“0 n-0 

Since l/4»(z) has no singular points, F(z) is analytic in | 1 — z \ < 2 and hence in 
particular in 0 g x < 3; since F{z) is a power series with nonnegative coefficients, 
it has a singular point at the positive real point on its circle of convergence 
[30, p. 214], and so it must be analytic at least in | z | <3. This gives the restric¬ 
tion lim sup„_» |/(n) | 1/n ^ Nevertheless, as we know, f(x) is represented 
in mean-square by a sequence of sums of terms a[ N) A k d(x) even if we assume 
only that 2 | /(n) | 2 converges. 

In the continuous case, if f(x) ^ 0 and we have 

(27) lim f /(m) — Eofc^^d) dx = 0, 

we must have g{x) = 0 almost everywhere outside (—ir, ir ) and then, as we saw 
previously, (26) holds also. Now since /( x) ^ 0, g(t) has derivatives of all orders 
if it has derivatives of all orders at t = 0 [29, p. 90] and it is easily seen from 
this that g{l) is analytic for all real t if it is analytic at t = 0. Now on the one 
hand, unless/(a) = 0, g(t) cannot be analytic for all real t if (as we are supposing) 
g(i) vanishes outside (—ir, ir). On the other hand, H(e'‘) = g{t)/<p(t) for real 
values of t close to 0 and so, if t is regarded as a complex variable, for complex 
values of t near 0. Since 1 /<p(t) is analytic everywhere, g(t) is analytic at t = 0. 
From this contradiction we infer that a nonnegative j(x) can never be represented 
in the form (27), although it may perfectly well be represented by 

lim f ]/(x) — E ar > A*0( x) dx = 0. 

ft-»0O J—oo I A :—0 



REPRESENTATIONS OP DISTRIBUTIONS 


391 


REFERENCES 
I. Charlier senes 

[1] A C. Aitken, Statistical Mathematics, Oliver and Boyd, London, 1939, pp 68-69, 

66-67, 76-79. 

[la] W Andersson, “Short notes on Charlier’s method for expansion of frequency functions 
in series,” Skandinavisk Akluar, tide., Vol. 27 (1944), pp 16-31. 

[2] L. A. Aroian, “The type B Gram-Charher series,” Annals of Math. Stat., Vol. 8 

(1937), pp 183-192, 

[3] R P. Boas, Jr , “The Charlier B-series,” Am. Math. Soc. Trans, (to appear) 

[4] C. V L Charlier, “tTber das Fehlergesetz,” Ark Mat. Astr. Fys., Vol. 2, no. 8 (1905) 
[6] C. V. L Chahlier, “Die zweite Form des Fehlergesetzes,” ibid , no 15 (1905). 

[6] C. V L. Charlier, “tTber die Darstellung willkurlicher Funktionen,” ibid., no. 20 

(1905) 

[7] C V. L Charlier, “Researches into the theory of probability,” Lund Umversitels 

Arasknft N F , Afd 2, Vol. 1, no. 6 (1906). 

[81 H. Cramer, Random Variables and Probability Distributions, Cambridge TractB in 
Mathematics and Mathematical Physics, no. 36, Cambridge University Press, 
1937. 

[9] W. P Eldehton, Frequency Curves and Correlation, 3d ed., Cambridge University 
Press, 1938, pp. 131-132 

[10] A Fisher, An Elementary Treatise on Frequency Curves and their Application in the 

Analysis of Death Cuives and Life Tables, Macmillan, New York, 1922 

[11] A Fisher, The Mathematical Theory of Probabilities and Its Application to Frequency 

Curves and Statistical Methods, Vol. 1, 2d ed , Macmillan, New York, 1922, pp. 
271-279 

[12] M. Jacob, “Uber die Charlier’scheB-Reihe,” Skandinavisk Akluar lids , Vol. 15 (1932), 

pp. 286-291 

[13] C Jordan, “Sur la probability des dpreuves r<5pdt<5es, le thdorfemc do Bernoulli et son 

inversion,” Bull. Soc. Math. France, Vol. 54 (1926), pp 101-137 

[14] N R. J0RGENSF.N, “Note sur la fonction de repartition de type B de M. Charlier,” 

Ark. Mat. Aslr. Fys , Vol. 10, no 15 (1915). 

[16] N. R. J0RGENSEN, “Unders0gelser over Frequensflader og Korrelation,” Copenhagen 
thesis, 1916. 

[16] M. G. Kendall, The Advanced Theory of Statistics, Vol. 1, 3d ed , Griffin, London, 

1947, pp 154-156. 

[17] S. Kullback, “On the Charlier type B series,” Annals of Math Stat., Vol. 18 (1947), 

pp 574-581, Vol. 19 (1948), p. 127 

[18] R von MisEs, Vorlesungen au$ dem Gebiete der angewandten Mathematik, Vol 1 ,Wahr- 

scheinliclikeitsrechnung und ihre Anwendung m der Slalislik und theorehschen 
Physik, Deuticke,.Leipzig and Vienna, 1931, pp. 265-269. 

[19] N. Obrechkoff, Sur la loi de Poisson, la sine de Charlier et les equations lineaires aux 

differences fimes de premier ordre d coefficients constants, Actual. Sci. Ind., no. 
740, Hermann et Gie , Paris (1938), pp. 35-64. 

[20] H Pollaczbk-Geiringeh, “Die Charlier’sche Entwicklung willkurlicher Vertei- 

lungen,” Skandinavisk Akluar. lids., Vol. 11 (1928), pp 98-111. 

[21] H Pollaczek-Geiringer, “tTber die Poissonsche Verteilung und die Entwicklung 

willkurlicher Verteilungen,” Zells, f. Angew Math, und Mech , Vol. 8 (1928), 
pp. 292-309 

[22] H L, Rietz, Mathematical Statistics, CaruB Mathematical Monographs, no. 3, Chicago, 

1927, pp. 60-68,170-177 

[23] E Schmidt, tfber die Charliersche Entwicklung emer “arilhmelischen Verteilung” nach 

den sukzessiven Differenzen der Poissonschen asymplotischen Darslellungsfunklion 



R, P, BOAS, JR, 


/iir die Mrscknldkil sellmr Emgnme, Sitzungebenchtc der Prcussischcu 
Akademie derWiaflenschaften,Pbys -Math KI. 1028,p, 14S. 

[ 24 ] E, Schmidt, "tlber die Charlier-Jordanscho Entwicklung einer willklulichcn Funkfcion 
nach der Poisaonschen Funktion und lhren Ableitung," Zeds, f, Angm, Math, 
tend Meek, Vol. 14 (1033), pp, 139-142. 

[26] H, L, Sudbehg, “tlber die Darstellucg willkUrlicher Funktionen durcli Charliersclio 
Differenzreihen, 1 ' Shniinmsk Akluar , lids,, Vol. 25 (1942), pp. 228-246, 

[26] H, L. Selbehg, "tlber die Darfltellung der Bichtef unktion einerVerfceilung durcli oitie 

CharlierscheB-Reibe," Archil) forMalkmlik egNaiumdenskab, Vol 46 (1943), 
pp. 127-138. 

[27] J. F. Steffensen, Some Recent Researches tn the Theory of Statistics and Actuarial 

Science, Cambridge University Press, 1030, pp 35-48, 

[27a] J, F. Steffensen, “Note sur la fonction de type B de M. Charlier,” Svenska Akluar, 
Tide,, Vol,3(1016),pp 226-228, 

[28] J. V. Uspensky, "On Charles Jordan's series for probability,” Annais of AM, Series 

2, Vol, 32 (1931), pp 306-312, 

II Other references 

[29] H, Cramer, Mathematical Methods of Statistics, Princeton Univeisity Press, 1946. 

[30] E, C, Titchmabsh, The Theory of functions, Oxford, 1932, 

[31] A. Zygmund, Trigonometrical Series, Warszawa-Lwdw, 1935. 

[32] Table of the Bessel Functions Jo(z) and J\(t) for Complex Arguments, prepared by the 

Mathematical Tables Project, National Bureau of Standards, 2d ed, Columbia 
University Press, New York, 1947. 



HEURISTIC APPROACH TO THE KOLMOGOROV-SMIRNOV 

THEOREMS 1 

By J. L. Doob 
University of Illinois 

1. Introduction and summary. Asymptotic theorems on the difference between 
the (empirical) distribution function calculated from a sample and the true 
distribution function governing the sampling process are well known. Simple 
proofs of an elementary nature have been obtained for the basic theoiems of 
Komogorov 2 and Smirnov 3 by Feller, 4 but even these proofs conceal to some 
extent, in their emphasis on elementary methodology, the naturalness of the 
results (qualitatively at least), and their mutual relations. Feller suggested that 
the author publish his own approach (which had also been used by Kac), which 
does not have these disadvantages, although rather deep analysis would be 
necessary for its rigorous justification, The approach is therefore presented (at 
one critical point) as heuristic reasoning which leads to results in investigations of 
this kind, even though the easiest proofs may use entirely different methods. 

No calculations are required to obtain the qualitative results, that is the 
existence of limiting distributions for large samples of various measures of the 
discrepancy between empirical and true distribution functions. The numerical 
evaluation of these limiting distributions requires certain results concerning the 
Brownian movement stochastic process and its relation to other Gaussian 
processes which will be derived in the Appendix. 

2. The problem. Let sq , ah , • • • be mutually independent random variables 
with a common distribution function F(\), 

F(X) = Pr[xj < X). 

In statistical language xl , , x n form a sample of n drawn from the distribu¬ 

tion with distribution function F(X) Let vj\) be the number of these x,'s which 
are < X According to the strong law of large numbers, for each X 

(2.1) lim = F{\) 

71—>« Tl 

with probability 1. For fixed n v n (\)/ n is itself a distribution function (which 
depends on the sample values xi, x n ) the empirical distribution function, 
and an elaboration of the argument which led to (2.1) shows that (2.1) is true 

1 Research connected with a probability project at Cornell University under an ONR 
contract 

a Inst, Ilal. Atli , Giorri , Vol 4 (1933), pp, 83-91. 

3 Rec Math. (Malcmaliceskn Sbormk),IA.S 6, Vol 48 (1939), pp 3-26 , Bull Math. Univ 
Moscou, Vol 2 (1939), fasc 2 

j Annals of Math. Stat, , Vol 19 (1948), pp 177—1S9 

303 



394 


J. L. DOOB 


uniformly in X, with probability 1; that is if 
(2.2) D fl = L.U.B. - F(X) 

then D» is a random variable and 

lim D n = 0 

n-»oo 

with probability l. 5 This result would be of limited practical statistical importance 
except that the distribution of D* does not depend on the distribution function 
F(\) if F(\) is continuous. In fact in that case the random variables F(xi), 
F(x 2 ), • ■ ■ are mutually independent and each is uniformly distributed in the 
interval (0, 1); if v„(X) is the number of F{xj)’s < X, for j < n, 

L.U.B. ^ — p = L.U.B. ^ - F(X) 
n —»<>.<“ n 

Thus it is no restriction, replacing xj by F(xj) if necessary, in finding the distri¬ 
bution of D„ to assume that F(\) = X for 0 < X < 1, and 

(2.20 D n = L.U.B. ^ - X 

o^x^i n 

The results will hold for £„ defined by (2.2) for any continuous F(\). We shall 
also consider Dt and D„ , defined by 


D+ = L.U.B. - xl, 

L n J 

D~ = -G.L.B.p^ - xl, 
L n J 


and again the results will hold (with the obvious definitions of Dt and D„ in the 
general case) for every continous F(\). 

The problem is to find the limiting distributions of (properly normalized) 
D„ , Dt, Dn when n —> ». 


3. Derivation of the Kolmogorov and Smirnov theorems. Define 

x n (t) = n» fejQ - t\ 0 < t < 1. 

Since r„(0) = 0 with probability 1 and v n (t) — v n (s) is the number of suc¬ 
cesses in n independent trials, with probability t — s of success in each trial, 
v„(i) — v„(s) has expectation n(t — s) and variance n(t — a) [1 — (i — $)]• Hence 

D(x„(t))=0, 0 < < < 1; 

D{[x n (f) - z n (s)] 2 j = (i - «) [1 - (i - «)], 0 < s < t < 1. 

1 Cf. M. Fr^chet, Ginkraliiks sur lea probabiliies. Variables aUatoires, Paris, 1937, pp, 
260-261. 



KOLMOGOROV-SMIRNOV THEOREMS 


395 


Now let {z(0} be a one parameter family of random variables, 0 < t < 1 
with the following properties: 

(a) for each,? if 0 < t\ < • • • < t, < 1 the j-variate distribution of the random 
variables x((i) 4 ■ • - , x(t,) is Gaussian; 

(b) (3.1) holds, that is 

(3,1') B{ic(0}=O, 0 < < < 1; 

®{[x(() — m(s)] 2 } = (f — s) [1 — (t — «)], 0 < a < t < 1. 

(c) Pr{x(Q) = 0} = 1. 

According to the central limit theorem, the j variate distribution of 
a:„(fi), • • ■ , x„(I/) is asymptotically that of x(ti), • - •, x(tj) ; in fact the normalizing 
factor a 1 in the definition of x n (t) and the choice of means and variances in (3.1') 
were made precisely to bring this about. As far as first and second moments are 
concerned the x n (t ) and x(t) processes are identical; when n —* « the distribu¬ 
tions, or at least the j variate ones mentioned, become identical also. 

We shall assume, until a contradiction frustrates our devotion to heuristic 
reasoning, that in calculating asymptotic x n (t) process distributions when n —» « 
we may simply replace the x n (t) processes by the x(t) process. It is clear that this 
cannot be done in all possible situations, but let the reader who has never used 
this sort of reasoning exhibit the first counter example 
The x{t) process has continuous sample functions (cf. Appendix). Define 

D = Max | x(t ) |, 

ostsi 

D + = Max x(t ), 

OS iSl 

D~ = —Min x(t). 

OS IS1 

Then in accordance with our substitution principle n*D„ , n^Dt, n^DZ have as n 
becomes infinite the distributions of D, D~, D~ respectively. (The latter two 
are the same because the — x(t) process is stochastically identical with the x[t) 
process.) Thus these simple qualitative considerations have led to the existence 
of the limiting distributions derived and evaluated by Kolmogorov, who proved: 
Theorem 6 (Kolmogorov) 

(3 2) lim Pr{niD n > X) = 2 f) (-l) m+1 e~ i,n ^-, 

7j-+aO 1 

(3,3) lim Pr^Dt > X} = lim Pr{nDZ > X] = e _2X \ 

n—oo «—► 

To complete our treatment we shall prove in the Appendix that 
(3.2') Pr[D > X) = 2 E (-) m+ V 2mSXS , 


6 In Feller’s paper ( loc . ail., p, 178, equation (1 4)) the factor 2 in the exponent was 
omitted by the printer. The same misprint occurs in Smirnov’s table of the values of the 
series in our (3 2), Annals of Math Slat , Vol 19 (1948), pp 279-281. 



396 


J. L. DOOB 


(3.30 Pr{D + > X} = Pr{D~ > X} = e~ 5 ^, 


so that in fact the above considerations have led not only to the existence but 
to the evaluation of the asymptotic distributions. (Actually we shall prove 
somewhat more general results about the x(t) process.) 

So much for the Kolmogorov theorems Smirnov obtained results (also 
independent of the given continuous distribution function F(\)) of a somewhat 
different nature. Let x* , x*, ■ ■ • be mutually independent random variables 
with the same individual distributions as the m/s, that is each distributed 
uniformly in the interval (0, 1); define p*(X) as the number of the first n x/a 
which are < X Smirnov considered the difference between empirical distribution 
functions, 


Drnn — L.U.B. 


<'m(X) r*(x) 


m 


as well as ZfL and D mn defined in the obvious way. To avoid stressing the 
obvious we consider only the D mn , 

Wl 

Theoeem (Smirnov). If in, n —> <» in such a way that — —> r, and if 

71 

N = mn/(m + n), 


(3.4) 


00 


lim Pr{N'D mn > X} = 2 £ ( 


-2 

0 • 


To derive this result define an x*(t) process stochastically identical with the 
x(t) process but independent of it. Then if x*{t) is defined by 

x* n (t) = n» , 

we identify, m accordance with our heuristic principle the process with variables 

{x(t) - rh*(t)} 

with the one with variables 


Doing this leads to the fact that the distribution of 


(N) 


1/2 


D mn = f—ZT~y 2 L.U.B. x m (t) - (-Y a 
\in + n) ost«si \n) 





Max | x(t) - (r) w x*{t) 

o<; tgi 


converges to that of 



KOLMOGOROV-SMIRNOV THEOREMS 


397 


Now the x(t) process and the process with variables 

j x(t) - (r) 1/2 a,*(/) \ 
l (1 + r)^ J 

are stochastically identical. Hence we are led to the conclusion that the distri¬ 
bution of ( N) 1,2 D mn converges to that of D, and this is Smirnov’s theorem, 
stated above (The method we use does not seem applicable to Smirnov’s deeper 
theorems on the number of intersections between empirical and true distribution 
curves or between pairs of empirical distribution curves.) 

APPENDIX 

4. The Brownian movement process. Consider any Gaussian stochastic 
process, with random variables {m(Z)} where t varies in some interval. That 
is, we assume that for each t in the interval x(t) is a random variable and that 
for any j > 1 if ti < ■ ■ ■ < t, are in the interval the j variate distribution of 
■ , x(tj) is Gaussian. In the following we shall always assume that 

E{x(t)} = 0. Then the process is determined stochastically by the covariance 
function 

r(s, t) = E{a;(s)a:(0}. 

In particular, if the range of parameter is the interval [0, °°) and if 

r(s, t) = v Mm (s, t), 0 < s, t < <», 

the process is called the Brownian movement process, or sometimes the Wiener 
process; <r is a positive constant. When considering this process we shall write 
f(<) instead of x(t). For the f(i) process 

Prim =0} = 1, 

E\m - f(s)] 2 S = ^ 1 1 - s I, 

and if 0 < Si < fi < s 2 < the increments x(ti) — :c(si) and x(k) — x(s 2 ) are 
mutually independent. We shall use the following properties of this process, of 
which the first two are well known. 

(a) The sample functions are everywhere continuous with probability 1. In 
the following we can therefore write as if all the sample curves were continuous. 

(b) For fixed s 

(4.1) Pr{ Max [f(s + t) - f(a)] > M = 2Pr(f(s + T) - Us) > X)- 7 

0< isr 

(Note that the use of a general initial value s, rather than 0, has not added 
to the generality and we drop this affectation below) 

(c) If a > 0, b > 0, a > 0, /3 > 0, then 

(4.2) Pr{ L.U.B. fr(0 - (at + &)] > 0} = e~ Mh \ 

os (<« 

1 Due to Bachelier, cf the proof by P L6vy, Comp Math, Vol 7 (1939), p 293 Oneway 
to prove (a) is to prove (4 1) first, with L.U B instead of Max, and then use it to calculate 
the probabilities relevant to (a) 



398 


J, L. DOOB 


(4.3) PrfL.U.B. \£(fy - (at + 6)] > 0 or G.L.B. [f«) + at + 0] < 0), 
£<« 


_ ^ 2[ni J a&“Km-*l)*a0+m(m~l)(afl+a&)l 

T71™1 

_|_ e “2l(*n — l)*ob + m*o{J + m(w — l)(o0 + nfr)l 
—2[m*(oi 4-00) + m(m — I)aj5 + m(m + l)abl 

6 

—2lm*(a& + afi) + m(m + l)a0 + m(m — 
o ) y 


in particular (a =» a, (8 = b) 


Pr< L.U.B. J-pfii 
l os <<» at -+- o 


> l} = 2l) (-l) M+l e“ 2m,al - 


'The probability in (4.2) is the probability that a £(t) sample curye will ever 
reach the line with slope a and ordinate intercept b; the probability in (4.3) is 
the probability that a sample curve will ever reach either of the indicated 
halflines, one above and one below the t axis. Since the right hand sides are 
continuous functions of a, b, a, /9 we could write >0 instead of >0 and <0 
instead of <0 on the left, so that these probabilities are also the probabilities 
that a sample curve will ever rise above the indicated line or leave the indicated 
angle. 

It will be convenient to describe a line by its slope and ordinate intercept; 
the line [«, v] is the line with slope u and ordinate intercept v. We shall take 
<r = 1 in the proof; this is no essential restriction since %(t)/<r is the random 
variable of a process of the same type whose a is 1 . 

To prove (4.2) let tp(a, b) be the probability on the left, the probability that a 
sample curve will reach the line [a, 6 ]. If b = 6 i + 62 , b, > 0, a sample curve 
which is to reach [a, b] must first reach [a, bi] and then move up to meet a line 
with slope a, b 2 units above the first meeting with [a, bj. Then 

bi + bz) = <p(a, bi) <p(a, bz). 

Now b) > Pr(f(l) > a -f b) >0 and <p(a, b) is monotone non-increasing in b, 
for fixed a. The only solution of the functional equation with these properties is 

<p(a, b) = e~* Mb . 

Now <p(a, b) is the probability of reaching [0, b] at some first time s and then 
going on to the line [a, b] which from the vantage point of the first common point 
(s, f (s)) is the line [a, os]. In other words, using (4.1) 





d, Pr{ Max f(f) 


> b} 





he -m 

(2 7r ) 1/ 2 S I/J 


ds 



K0LM0G0R0V-SM1RN0V THEOREMS 


399 



_ —&(2a^(a))l/2 

“ 3 


from which it follows that ip(a) = 2 a, and this yields ( 4 . 2 ). 

To prove (4.3) we consider first the following general problem: Let [ui, Vi], 
[«a , Vi], • • , Uj > 0 , Vj > 0 be a sequence of lines; let t = fi be the first value of t, 
if any, at which a sample curve meets [iq, if h is defined for a sample curve 
let <2 be the first value of t > 4 , if any at which the curve meets [—u 2 , —if 
<2 is defined for a sample curve, let U be the first value of t > U, if any, at which 
the curve meets [ui, v 3 ], and so on. Let T n be the probability that there is a point 
t n , in other words the probability that a sample curve meets the lines [iq , Vi], 
{—Ui, — Vi], [(— l) n+ 1 u n , (—l) n+ 1 y„] in at least n successive points. We write 

TTn = 1Tn(u 1 , Vi , • • • , U„ , Vn). 

In particular, according to (4.2) 

(4.4) *i(tn , tn) = e*"”. 

To evaluate ir„ , let Q be the point (t n -i, f (f n -i)) on the sample curve, and 
suppose for definiteness that n is even. Starting at Q, if there is a t n , the curve 
must finally reach [— u n , —»„], that is it must go to a line of slope —w„, which is 
w n -i£ 71-1 + w n -i + u n t, 1-1 + v n units vertically below its initial position Q when 
t = t„_ 1 . According to (4.2) the probability of doing this is 

Now we replace the line [— u n , —r n ] by a line which depends on t„-i but which 
leaves this probability unchanged; the new line has slope — (u „_1 + u n ) and is 

h = 7, 1 (Wn-l U l-l + *4-1 + u n tn -1 + V n ) 

units below Q when i = t n -i . Finally we reflect this new line in the line parallel 
to the t axis through Q. These two changes do not affect the probability we are 
discussing because the changes of f(£) after t n -\ are independent of the changes 
before and have symmetric distributions. The final line has slope w«_i + v n -x 
and is h units above Q when t = t„-i ; it is the line 

Un—jVn—l "j~ U n V n 4" 2u n Vn— l "j 
Un -1 + Un J 

which does not depend on 1 . This line lies above [w„_i, v„-i] in the first quad¬ 
rant, so that if a sample curve reaches it the curve must also intersect [«„- 1 , iV-d* 
We have thus proved that 

(4.5) Tt n (Ui ; m„, vj 


[V„.l -f- U n , 



400 


J. L. DOOB 


The fundamental identity (4.5) makes it possible to reduce the evaluation of 
7r„ to 7 n in n — 1 steps, m is evaluated in (4.4). Thus successive meetings with n 
lines have been reduced to a meeting with a single line. As a first example suppose 


Then we have 


so that 


Ul = • ■ • = w n = u, «i =•••=«» = a. 

a 

7r n (w, v, -,u,v) = 7r„_i (u, V, • • ■ ; 2 u, 2v) = 

= wi(nu, nv), 


(4.6) TT n (u, v, ■ ■ • ;u,v) = e _2n uv 

More generally suppose 

Ui = U 3 =•'•= a, Vi = vi — b, 

Ui = m =•••= a, v 2 = vi = • • • = (3. 

Then we show that for suitably chosen C^’s we have according as n is even 
or odd 

^ T^ Cl n) ab + + C'l n) ap + c£°ab~| 

ir n {a, o; • • • •a,f))=ir l - (a + a), ------- ; 

A n, . s 

2 ( a + «) 

7Tn(0) b j • • • O, b) 


~n + i n ~ l C\ n) ab + Ci n) ap + Ci n) ap + C^ab 

“ Tl — 9 — a + — 9 — “>-—-—— -• 

« ^ n + 1 . 71 + la 

—2—-2— 

For n = 1 this form is correct with 

Cf = 1, <7, w = Cf> = (7« w = 0 

If now n is even and if the equations are true for n, 

r n+ i(a, b; • ■ ■; a, b) = ,r 2 /a, b; ? (« + a), ghV + C ^ ah ± C ^ ab + 

\ / 

= T1 ( n + 2 a + - a + Ci W + ^ n> Qh + Ctu + Cl'V + n(a + a)fr \ 

\ 2 2 “’ »+2 ,» )’ 

and comparing this with (4.7) we find that 

Ci nW = CP + 71 + 1, 



K0LM0G0R0V-SMIBN0V THEOREMS 


401 


ci n+1} = ci n \ 

£v(»+l) _ Q(n) 

Ci n+1) = Ci n) + n. 
If n is odd we find similarly that 

C'i<" +1) = Ci n) + n, 

£r(n+l) _ £<(«) 


cr +) = Ci 


(n) 


£^n+l) 

— Ca" 5 4 n + 1. 

The solution of these equations is 


n even 

n odd 

2 

sy n) ft 

Cl - 4 

rtW (n 4- 1)' 
Cl 4 

pM _ n 

_ (n —• 1) 

r>M _ n ( n ~ 2 ) 

Ci 4 

£»(») __ n ~ 1 

nM n(n + 2) 

" 4 

t— H 

1 TfH 

“s 

II 

'c 

o' 


Then 

(4.8) 


_ _ —h[n z ab + n2a(9 + n(n — 2 )a(3 4- n(n + 2 )« 6 ] 

7r n — e 

-it(n + l) 2 ob + (n - l) 2 a/9 + (n 2 - 1 ) a 0 + (ti 2 - 1) a bJ 

7r n = e 


We can now prove (4 3). In fact the left side is equal to 

7Ti(a, b) + 7ri(a, /3) — jr 2 (a, h; a, /3) — ^(a, P; a, b) + 


(n even). 


(n even), 
(n odd). 


which gives (4.3), on substituting (4 8) Only (4.3'), which follows from the 
simple (4.6), is used in the application to the Kolmogorov-Smirnov theorems. 


6. Transformations of Gaussian processes to the Brownian movement process. 
The f(f) process studied in section 4 is so simple that it is important to be able to 
reduce others to it by elementary changes of variable For example if the co- 
variance function of a Gaussian process has the form 

(5.1) r(s, t) = u(s)v(t), s < t, 

for s, t in some interval, and if the ratio 

VO _ / A 



402 


J. L. DOOB 


is continuous and monotone increasing, with inverse function Oi (£). We define 

a) v[am ’ 

With this definition the £ process is Gaussian and since if s < t 

the £ process is the Brownian movement process with a — 1. This transformation 
from the x to the £ process is effected by a combination of a change of variable 
in t and the application of a variable scaling factor. (Conversely, if such a trans¬ 
formation is applied to the Brownian movement process it is trivial to verify 
that the new covariance function will have the form (5.1). The Gaussian processes 
with covariance functions of this form are easily seen to be the Gaussian Markov 
processes) 

6. The Gaussian process with r(s, t) = s(l — t). In section 3 the Kolmogorov- 
Smimov theorems were reduced to properties of a Gaussian process with param¬ 
eter t, 0 < f < 1, for which 

Pr{x(Q) =0} =1; 

£{*(«)} = 0; 

E{[x(i) - *(s)] 5 } = (t- s)[l - (t- «)], 0 < s < t < 1. 

Now these equations imply that 

= f(l - t), £?{m(s) 2 } = s(l - s), 

and combining the set we find that 

r(s, t) = E{x(s)x(t)} =8(1 — *), 0 < 8 < t < 1. 

This covariance function has the form studied in section 5, and using the trans¬ 
formation of that section 

r(0 = « + D* > o < t < », 

defines a Brownian movement process (with cr = 1). Then if D, D + , D~ are 
defined as in section 3, we have from (4.3') 

Pr\D > X} = Pr (l. U. B. -M- > x) = Z (- l) m+l e~ 2m!xl , 

l og(<« t + 1 J 1 

Pr{D + > X) = Pr{D~ > X} = e“ 2XS . 


and from (4.2) 



K0LM0G0R0V-SMIRN0V THEOREMS 


403 


This proves (3.2') and (3.3'). Note that we could go beyond these results, because 
of our detailed knowledge of the x{t) process. For example we can evaluate 


lim Pr((n)*D“ < Xi, («)*/)£ < X 2 }. 

n->=c 

If Xi = X 2 = X the probability is the probability that (n) 1/2 D„ < X which we have 
already treated. In general it is, in the limit, 


Pr| Min x(t) S — Xi, Max x(i) < X 2 } 

Oglgl 0£(^1 


= Pr 


G.L.B. 

0^ 


r(0 > 

t + i ~ 


X x L.U.B. 


1 



00 

! V' f ~2[m2X 2 +(m-l)2X*+2Mi(ro-l)XiX 2 J , -^2[(m-l) 2 X®+m 2 X*+2m(7n-l)XiX 2 ] 

= I — [e 2 1 ~r e si 


—2{trt 2 (X 2 +X 2 )-t-rrt(w—l)XiXj-f-m(TT»-l-l)XiX2l —2 [m* (X 2 +X 2 )-|-m(m+l)XiX2H"ff l (Tn“'l)X jXj] 

e i 2 — 6 12 


1—^2 { e ~ 2[wX * + < TO - 1 > x ii 2 -j_ g-aKw-ma+rnXii 2 _ 2 e -2«* 2 (Xi+x2) 2 j 


obtained^by setting a = b = X 2 , a = /3 = Xi in (4.3). 



PEARSONIAN CORRELATION COEFFICIENTS ASSOCIATED WITH 
LEAST SQUARES THEORY 

By Paul S. Dwyer 
University of Michigan 

1. Introduction and summary. It is well known that the-zero-order correlation 
between the predicted value of a variable and the observed value of the variable 
is the multiple correlation. It is also well known that the zero-order correlation 
between the residuals for two different variables, when the prediction is from a 
common set of variables, is the partial correlation. These considerations naturally 
lead to a systematic investigation of all the zero-order correlations involving 
the various variables associated with least squares theory. Such an investigation 
is the purpose of this paper. 

As a result of this study it appears that other zero-order correlations include 
the multiple alienation coefficient, the part correlation coefficient, and certain 
other coefficients which, as far as I am aware, have not been previously defined. 

The paper first examines the case of a single predicted variable and then 
continues with the case in which two or more variables are predicted simultane¬ 
ously. The paper includes (1) a theoretical development of the different coeffi¬ 
cients and the relations between them, (2) the expression of the formulas in 
determinantal form, (3) a matrix presentation of the material, and (4) an outline 
of the calculational techniques—with illustrations. 

It should be made clear at the start that this paper deals with populations 
(finite or infinite) and not with samples from those populations. The sampling 
distribution of each of the new correlation coefficients defined in this paper 
might well become the subject of a later investigation, but first w r e need to 
know what these correlation coefficients a T e. 

2. The case of the single predicted variable. Notation, definitions, and basic 
properties. We suppose that a population consists of N individuals with values 
Xij , X 2j , ■ ■ , Xij, Yj for the variables Xi , X 2 , - ■ • , X*. , Y and that Y is 
linearly predicted from the X, by the formula 

(1) E = Y - a 0 - oiZi - a 5 X 2 - a k X k = Y - Y 

by least squares theory. For the purposes of this paper, we use a concise summa- 

tion notation, SQ, in place of the more formal serial notation 23 Qt which is 

i-i 

b 

preferable to the frequency notation 23 Q*fz and, in the continuous case, 

b 

/ Qxfzdx Moreover it is desirable that the scales of X and Y be chosen so as 

Ja 


404 




fearsonian correlation coefficient 


405 


to facilitate the easy determination of the various formulas. If we let 

Y, - Y 


( 2 ) 


y » 


VNVy ’ 


X, - x , 

Xv ~ 


we have = Si/ 2 = 1 with the resulting correlating formula 


(3) 


P*,V ~~ 


2x,l 


V(£x 2 )(S?/) 


= Ex, y and p I(l3 = Ex,x/. 


The transformations (2) when applied to (1) give 


(4) e = = ?/ - (/3iXi + /3 2 x 2 + ■ • • + ft**) = y - y 


where the j6’s are standard regression coefficients and e is defined to be 


E 

VNay' 


It is to be noted that the values of x„ y, e, and y are all dimensionless. 

The values we wish to correlate are those of X,, F, 1 F of (1). The zero-order 
correlations involving these are the same as for x„ y, e, y of (4). 


3. Correlations with a single predicted variable. We wish to minimize 
Se 2 . Differentiating with respect to 0, and equating to zero we get 

(5) Sex, = 0 

from which by multiplication by 0, and summation for i, 

(6) S ey = 0. 

It follows that 


(7) Se 2 = S e(y - y) = 2ey = ~Z(y - y)y = S y - S yy = 1 - S yy 


Using (4) and (7), we get 


= 1 - S(e + y)y = 1 - S y 2 


so that 
( 8 ) 


& = = 4 = i - *v 

iv <r Y (7y 


V- 


n 9 

Cy ( 5 k 

2 

ffy 


This is the conventional definition (from least squares theory) of the multiple 
correlation coefficient, so 

(9) Pw.xix a- .a* = Py{x) — 'Ey = Eyy. 

Application of (9) to (7) gives 



406 


PAUL S. DWYEH 


where k V ( X ) is the multiple alienation coefficient. We now have 2xl = 1, 2 y i = 1, 
2e 4 = 4(*h and 2y 1 = so that we are able to present formulas involving 
x„ y, e, y. We first form the cross products 

(11) 2xy = p xu , 

(12) 2xe = 0, 

(13) 2xy = 2x(y + e) = 2xy = p cl/ , 

(14) 2 ye = 2y(y - y) = Sy 2 — 2i/^ = 1 - 2?/ 2 = 4<x), 

(15) 2^2/ = Si/ 2 = pScx), 

(16) Se^ = 0. 

We then have 


(17) 

(18) 

(19) 

( 20 ) 


Px.x, 


P* xV 


Pxe 


2x l x ] 

VW) 

2x,y 

V(^W) 

2xe 

V(2x 2 )(2e‘) 


— 'ExtXj. 
= 2x t y, 
= 0, 


2xy _ 2xy _ p^ 
V (2x 2 )(2y 2 ) p v c*> P»(x) 


It is interesting to note that this is unity in case k = l for then p IV = p V (,>. Other¬ 
wise the absolute value of p*„ is larger than that of p x „. For this reason this 
coefficient might be called the multiple augmented correlation coefficient. 


( 21 ) 


_ 2 ey _ 4(*> 

Pw V (2e 2 )(2y 2 ) k v{x) 


K v(z) • 


Thus the correlation between y and its residual is the multiple alienation coeffi¬ 
cient. 


( 22 ) 


Pvv ~ vWXsp) = x/2 ^ 2 Pk(i) ‘ 


Thus, as is well known, the zero-order correlation between observed and pre¬ 
dicted y is the multiple correlation, 


(23) 


_ 2 ey 

Ptv ~ V(2e 2 )(2?/) 


= 0 . 


4. Notation for the general case. We need to extend the notation and the 
definitions before examining explicit formulas for the more general case of two 
(or more) predicted variables. Suppose that F, and Y, are the two variables 



PEABSONIAN CORRELATION COEFFICIENT 


407 


predicted from the same X’a. Then from (4) we write 
E x 


(24) 


e, = 


■\/N<7y, 

E, 


— Ui ” filial fii$X2 ■“* fitk%k — Vt y* 


6} — — 2/j ft 1 $?2 *s ‘' * fiik %k — Vi y,- 

We then have the two sets of normal equations 
(25) 2e,a; = 0 Ze 3 x => 0 

so that 

~2e t y, - 0 he,y t = 0 

Se.y, = 0 2 ejy s = 0. 

It follows that 

2e,e, = 2c,Q/, - y } ) = Se. 2 /, = 2(y, - y,)y, = 2 j/ 4 /, - 2 y<y 3 

(27) 

= Si/,' 1 /, - 2y,y } = 2j/,-j// - 2 yq) = p f y - 2$^, 

if we use the notation that p„- = p UlVr 


(26) 


5. The correlations involving more than one predicted variable. In this case 
the y’s, the e’s and the y’s (as well as the a?’s) can have more than one variable 
so that the correlation coefficients we need, in addition to those of section 3, are 
PutVj! P«i«p Pmvjt Pv •«,) Pvtvn PvtVj> Peivit and pj, ej . We need now only the 

summed products 


(28) 

(29) 

(30) 

(31) 


Sj/tfz — Pi/,vi — P'S> 

2e,e, = p, 3 - Hy,yj as given in (27), 

2t/<e, = 2 y l {y ] - y,) = Zyyj - 2 yq, = p„ - 2^„ 

2 e,y, = 0. 


We have then 
(32) 


ZViV) 

P,i 


ZVtVn 


(33) 


2e,e, _ p it - 2y, y, - 

P “'’ ~ V(Ze 3 )(Ze 3 ) ~ kim 


This is the partial correlation coefficient, 

sy. Vi 


(34) 


Pv.v, — 


V(Zyf)(Z$ 


^■y> y> 

pt(*)Pj<*) 




408 


PATH. S. DWYER 


This coefficient appears to be new. Since it is the correlation of predicted values, 
I suggest that it be called the predictions correlation coefficient 


(35) 

2 y, c, 

pii - 2y*y, 

Pv ' e ’ " 

K ](.z) 

(36) 

2 e t y, 

Pi, - ^Vilh 


K i(z) 


The correlations given by (35) and (3G) have been defined previously and are 
known as part correlation coefficients [1; 213,497]. 


(37) 

2?/, y, 

2 Viy, 

PV ' V ‘ VVyDVtf) 

P] Ci) 

(38) 

s?/. y, 

zyiy, 


pi(i) 


The correlations of (37) and (38) appear to be new. Each is, in a sense, a generali¬ 
zation of the multiple correlation coefficient since it becomes the multiple cor¬ 
relation coefficient when i = j I suggest that it might be called the cross multiple 
correlation coefficient, since it correlates the actual value of one variable with 
the predicted value of another. 


(39) 


Pe iV, 


PlA»/ 


Se »y> _ n 
V(Zel)(2y() ’ 

_ n 

•N/(S$(2*J) 


A summary of definitions and names of Pearsonian correlation coefficients asso¬ 
ciated with least squares theory is presented in Table I. No name is proposed 
when the coefficient is identically zero, 


6. Relations between the correlations. Many relations exist between the 
correlations defined in earlier sections. Some of the more interesting of these 
are obtained by the elimination of 2^/ from formulas involving this term,Thus 
from (34), (37), and (38) we get 

'Vi ~ PlhV,PHx)Pi(x) — PvtVjPHx) — Putl/jPitoi 

and from (33), (35), and (3G) we get 




We then have 
(40) 


Pi 1 Pv. Vj pi(x) Pl(x) 

Pi} ~ Pvttl, Pl(x) 

Pi] PV\V, Pi<i) 


= < 


Pt,»j Kill 0 Kf (z) 

Pll •«/ K 1 (*) 
(P«iB, K i(i) 


where the six members may be equated in all possible ways. 



PEATtSONIAN COJUIELATION COEFFICIENT 


409 


Interesting and simple relations can also be obtained by formation of ratios. 


Thus 

(41j 

P‘ t ‘, 

Pvitl 

1 

K >(*) 

so^ 

_ K i(x) 


P&% e } 

1 

P«tU j 

Kj(x) 


PtiVi 

K l(l) 

" 



TABLE I 


Definition 

Name 

Single predicted variable 

P*i*j 

Correlation coefficient of zero order 

Px V 

Correlation coefficient of zero order 

pxt = 0 

None 

Pxy 

Pxy — 

*Multiple augmented correlation coefficient 

pyx 


* 

II 

V 

Q. 

Multiple alienation coefficient 

Pvv — PvM 

Multiple correlation coefficient 

P°v = 0 

None 

Two or more predicted variables 

Pv,v, 

Correlation coefficient of zero order 

Pa. «j 

Partial correlation coefficient 

Pv,v, 

*Predictions correlation coefficient 

Pl/l «) 

Part correlation coefficient 

Pv,v, 

*Cross multiple correlation coefficient 

P tiVj 

None 


* Proposed name 


Similarly 

(42) 


PViVl = P*(s) 
Pv\V{ P;(s) 


The geometric mean of similar coefficients yields such expressions as 

VPv.«, p ai!/j “ P“l e / V *»(*) K i(x) 

Vpimy , PVtVi — Pv\Vi Vp l(x) PHx) 


(43) 


7. Determinantal formulas. The implicit normal equations (5) become when 
expanded 

PuP 1 + P1202 + • ’ ' + Plkfik = Plv 
(44) P2l(Sl + PllPl + • + p-llfik = Ply 


Pl.lPl + PI. 202 + ‘ — Pllfh = Plv 


I 





410 


PAUL S. DWYER 


while 2 yy = ly = pj^) becomes 


(45) 

Pylft + Pipft + ”• + Pyltftfc ~ Py(x)- 


Let A be the determinant of the matrix of the solution of the k x’a and y. Let A' 
be the corresponding determinant with p vv replaced by pl <X ). Let be the 

determinant of the correlation-matrix of the k re’s. Then py (I) = ly 2 = Zyy can 
be expressed as a function of A and A w If (44) and (45) are to hold simultane¬ 
ously, then A' = 0. Expanding 6! in terms of the bottom row, we get 

(46) 

A' = 0 = pImAw + “terms”. 


Similarly 

(47) 

A = PwA vv + “terms” 


where the 
A = (1- 

’’terms" of (46) and (47) are identical. It follows by subtraction that 
Pu(i))A w and hence that 

(48) 

1-A. 

Aft 


Then 



(49) 

Ze s = ley = = 1 - ly 2 • 1 - (l - A.) 

\ &w/ 

A 

“ A*,’ 

Correlation formulas of section 3 then appear as 


(50) 

Pxv 

f “ " lA -i’ 


(51) 

p ‘ v = ]/ ~t v ’ 


(52) 




In a similar way the normal equations (25) become two sets of normal equations. 
The first set is like (44) with ft replaced by ft,,, and p„. replaced by p Vi „ The 
second set is similar with i replaced by j. It is desired to find 

(53) — 2yiyj = p V) ift + Pi/,zft + • • ■ + 

Now using (53) with (51) as applied to y, and using the technique of the first 
part of this section, we get 

(54) = Pvivj^vivt vjvt "b “terms”, 

(55) 0 = 'ZyjyAwvivj + “terms”, 

where A is the determinant of the matrix of the correlations of the 7c x’a, y, and 



PEARSONIAN CORRELATION COEFFICIENT 


411 


Vi'i \,y, is the determinant obtained by deleting the column involving correla¬ 
tions of y t and the row involving correlations of y l \ A VlVi is the determinant 
of the matrix of the k x’s; and the "terms” in (54) and (55) are identical. It 
follows that 

(56) = PlJ - 

^VtVfVjVj 

and thence 

(57) Pv - Y Ul y 7 =. 

'Vjvi 

The formulas of section (5) then appear in determinant form as follows 


(58) 


as is well known. 



Formulas for and p v , v , are similar to (60) and (61). 

Modem methods of calculating determinants (2), (3), (4), (5) are advised if 
calculations are to be made from those formulas. 

8. Matrix formulas. A matrix presentation is very useful in exhibiting the 
general features of this theory and in developing compact and easy methods 
of calculation with finite populations. The matrix presentation here is similar 
to that given by the author in a previous article [6]. 

Let the normal equations (24) be represented by the matrix equation. 



( 62 ) 


E = Y — XB = Y — Y. 



412 


PAUL fc>. DWYEtt 


Then, the sets of normal equations become 

X'E = 0 or X'(Y - XB) = 0 


so that 

(63) X'XH = X'Y. 

Now since XB = Y, (63) can be written as X'Y = X'Y and if can be shown 
that 

(04) TY = Y'Y = Y'Y. 


But under the assumptions of section 2, X'X is the matrix of the intercorrela¬ 
tions of the X's, X'Y is the matrix of the intercorrelations of the ®’s and y 's 
and Y'Y is the matrix of the intcrcorrelations of the j/’s. Hence (03) can be 


written 


(65) 

RxxB = Rx V 

so that 


(66) 

B = Rxx Rxv 


If Y is composed of a single variable, B is a single column matrix (vector) 
but if Y is composed of m variables, B is an m column matrix. It follows at 
once that 


(67) Y'Y = Y'Y = B'X'XB = B'R XV B = Rlj&RJRgR„ l = 
and that 


( 68 ) 


B'B = (Y - XB)'E = Y'E = Y'(Y — XB) = Y'Y — Y'Y 


= Y'Y - Y'Y 


Buy RxuRxz R xll . 


It thus appears that the matrix (67) has diagonal terms Si/ 2 = ~Syy which are 
the squares of the multiple correlation coefficients, and that the non-diagonal 
terms are 2iyq/y = 2i/q/,-. Similarly the matrix (68) has diagonal terms Sc 2 = 
Ki/(*j and non-diagonal terms 2e,e 3 ' = 2e,i/y. It follows that all the correlation 
coefficients defined above may be calculated from the matrices R xx , R xy , R uv> 
Y'Y , and E'E, The matrix (67) might be called the multiple correlation matrix 
and tho matrix (68) the multiple alienation matrix. 

Conventional results are expressed in terms of the correlation matrices R xx , 
R xv , and R vu . All the correlation coefficients defined in this paper may be ex¬ 
pressed in terms of these matrices and the multiple correlation and alienation 
matrices. 


9, Calculational method of determining the multiple correlation and multiple 
alienation matrices. Various methods might be used in calculating the multiple 
correlation and alienation matrices from the correlation matrices, One method 
utilizes the square root method of solving simultaneous equations, which has 



PEYRSONIAN CORRELATION COEFFICIENT 


413 


recently been presented in a number of places, [7] [8] together with a device 
which is similar to that used by Aitken [9] in eliminating the back solution This 
method solves the equation (65) by forming the auxiliary 

(69) S XX B = S zx R7iRxu 
where S xx is a triangular matrix such that 

(70) R xx - SLS XZ = 0. 


TABLE II 


Genoral 

Illustration 


Ryu 





1.000 

.495 







— 

1.000 



1.000 

.652 

.554 

.615 

.313 

.650 



— 

1.000 

.747 

.693 

.280 

.803 

Rxx 

Rxy 

— 

— 


.774 

.182 

.804 



— 

— 



.166 

.812 



1.000 

.652 

.554 

.615 

.313 

.650 




.758 

m 

.385 

.100 

.500 

Sxi 

&zzRxx Rxy 



.659 

.360 

.064 

.287 






.586 

.072 

.199 







.117 

.221 


Y'Y 





— 

.794 







.883 

.274 


E'E 





— 



The right hand side of (69), when premultiplied by its transpose yields 


(71) (S xx RjR xlt )'(S xx R~ x R xy ) = RLjRJ S ZX S XX R7 X R XU = R xy R xx R xy = Y'Y- 


Speaking less technically it is only necessary to multiply the columns of 
S xx R7xR xv to get Y'Y- 

A first illustration utilizes the correlations of the Carver anthropometric 
data [10] for 1000 University of Michigan freshmen, This group may be regarded 
as constituting a population, or it may be regarded as a random sample of a 
larger population. For present purposes we regard it as a population. Height 
(Fi) and weight ( Y% ) are estimated from shoulder girth LY : ) chest girth (X 2 ), 
waist girth (X 3 ), and right thigh girth (XT) The calculation of Y'Y and E'E 
from the correlation matrices follow 























414 


PAUL S. DWYEK 


As a second illustration I use the correlation between the parts of two forms 
of the Thorndike Intelligence Examination which Lorge has used in illustration 
canonical correlation technique [11, 69-74]. The X’s are the scores on the three 
parts of Formf A and the Y’b are the scores on the three parts of Form B. In 
this case we designate the results by r’s and fe’s (rather than p’s and k’s) since 
the calculation is considered to be for a sample. The calculation of the sample 
multiple correlation andhnultiple alienation matrices is presented in Table III. 


TABLE III 



Form A 

Form B 



Xi 

Xa 

XI 

yi 

y 2 

yj 






1.0000 

.8235 

■ 









Rw 






_ _ 




1.0000 


.7852 

.8986 

.7841 

.8217 



— 


.8393 

.7961 

.8543 

8254 

Rxv 


— 

— 


.7683 

.8226 

.8588 


1.0000 


.78521 

.8986 

.7841 

.8217 


s„ 



.3609 

.1487 

.3864 

2920 

S xx R7»Rxu 




.5032 

.0180 

' 

.1341 

.2146 






.8299 

9 

.7858 






— 


.7861 

Y'Y 





— 

Bl 

.8069 






.1701 

.0590 

.0054 






— 

.2179 

.0454 

E’E 





— 

— 

.1991 



10. The numerical values of the coefficients. The diagonal entries of the 
multiple correlation matrix give the values of 2 y) = Sj/.j/, = pj (l) while the 
non-diagonal values are 2y,j/ # = The diagonal entries of the multiple 

alienation matrix are 2e? = 2e,#v = while the non-diagonal entries are 
2eje,- = 2e,-j/ ; = 2. We are then able to write out any of the correlations 
easily. Thus from Table II 

phx) = V2= VJT7 = .342, 

P 2(I) = \/2yl = V^94 - .891, 

«H*) = V2eJ = V-883 — .940, 































PEARSONIAN COREELATION COEFFICIENT 


415 


*2(x) = 

V 2 e 2 = 

\/ 206 = 

= .454, 

012(3) = 

Sei 62 


.274 

V( 2 eS)( 2 eI) V(.883) (.206) 


^yiVi 


.221 

PJ/1^2 = 

V&yD(2yl V(.117)(.794) 


2 eie 2 

.274 

= .604, 

PlA«3 = 

V2el ~ 

V / .206 


2 eie 2 

.274 

= .291, 

Pit 2 «l = 

\/ 2 i? “ 

V1883 

Pni/i = 

'Zyiyi 

Vm ~ 

.221 

Vi794 

= .248, 


2 yiJ /2 

.221 

= .646. 

Pi/ 1 E /2 = 

V 2 f? “ 

VIT7 


TABLE IYa 


General 

Illuafcration 

Pl(x) 

2yi 

r l/tVi 

T V]Vl 1 r i/lVl 


.9110 

.8299 

.9489 

.8392|.8644 
.7645 

.9603 

.8626 j’ .8747 
.7858 


r 2<») 

27/5 

T - 1/3^3 

r ilVl | r V2l/3 

zyiy* 


.8844 

.7821 

.9917 

.8889 | .8751 
.7861 



rib) 

2^ 



.8983 

.8069 


TABLE IVb 


General 

Illustration 


r *l *E 

r *1 *1 


.3066 

.0298 

/ck-b) 

T | T yi , a 

«ll/l I r Vl «1 

.4124 

.1431|.1264 

.0131 1 .0123 

2«i 

Seie 2 

Se v e, 

.1701 

.0590 

.0054 



<2 



.2214 


ki (x) 

r «iiri 1 r vi «i 


.4668 

. 0973 1 .1033 


2el 

2e 2 e 3 


.2179 

.0454 



&3(*) 



.4394 



Del 



.1931 















416 


PAUL S. DWYER 


It is possible to utilize a scheme of successive division if all these correlations 
are desired when there are more than two predicted variables. By divisions wc 
compute in turn p l(x ), p VtVj , p VlVj and p„, V) from the multiple correlation matrix 
and km p, lV) , from the multiple alienation matrix for each i, j. The 

computational scheme is illustrated in Table IV where the correlations used 
me the sample correlations of Table III. The calculations from the multiple 
correlation matrix are presented in Table IVa and those from the multiple 
alienation matrix in Table IVb. 

In Table IVa the multiple correlation matrix is first entered on the third of 
each three lines The square root of each diagonal term is then extracted to give 
the multiple correlation coefficients The value of r,^ is then locked in the 
machine as a divisor and it is divided, in turn, into 2j/ij/ 2 , Si/ii/a to get r vm and 
r viur Then nw is used as a divisor by division into r VlVi to get r VlVS) into Sy x 
to get r Vl v „ and into Si/ii/a to get r VlSs . Finally is divided into r Vlth to get 
T nVv into Ti/ij/a to get r VlV „ into r vm to get r nv , and into to get r ViVs . A 
check on these divisions can be made, if desired, by dividing r Vin by ri (l) to get 
r tl v 2 , r mV} by r m to get r vivj , ancl r ViV , by r lu > to get r„, #3 . 

Table IVb is treated in a similar manner. 

This technique is immediately applicable to the case of many predicted 
variables. 


REFERENCES 

11] M. Ezekiel,, Methods of Correlation Analysis, Second Edition, Wiley, 1042. 

[2] A. C. Aitken, "On the evaluation of determinants, the formation of their adjugates, 

etc.,” Edinb. Math. Soc, Proc., Senes 2, Vol 3(1933), pp 2(17-219. 

[3] P. S. Dwyer, "The evaluation of determinants,” Psychomclrilca, Vol. 6(1941), pp 101- 

204. 

[4] A. C. Aitken, Determinants and Matrices, Second Edition, Oliver and Boyd, Edin¬ 

burgh, 1042. 

[6] F. V Waugh and P. S. Dwyer, "Compact computation of the inverse of a matrix,” 

Annals of Math, Stal, Vol. 16(1945), pp, 350-371 
16] P. S. Dwyer, “A matrix presentation of least squares and correlation theory, etc.,” 
Annals of Math. Slat., Vol 16(1944), pp. 82-89, 

[7] P. S. Dwyer, “The square root method and its use in correlation and regression,” 

Am. Slat. Assn Jour , Vol. 40(1946), pp. 403-503, 

[8] D. B. Duncan and J. F. Kenney, On the Solution of Normal Equations and Related 

Topics, Edwards Brothers, Ann Arbor, 1046. 

|9] A. C, Aitken, "The evaluation of a certain triple product matrix,” Roy Soc. of Edinb . 
Proc., Vol. 57(1037), pp, 172-181, 

[10] II. C Carver, Anthropometric Data, Edwards Brothers, Ann Arbor, 1941 

[11] Irving Loege, "The computation of Hotelling canonical correlations,” Proceedings of 

Educational Research Forum, Endioott, N. V , Aug, 26-31,1940, pp, 68-74. 



INVERSION FORMULAS IN NORMAL VARIABLE MAPPING 

By John Riordan 
Bell Telephone Laboratories, New Y ork 

1. Summary. The two inversion formulas considered here arise from study of 
G. A. Campbell’s work on the Poisson summation, which is described more fully 
in the introduction and in the main consists of finding a function or mapping of 
a variable connected with the summation in terms of a normal (Gaussian) 
variable g. More generally, this last is a process often called “normalisation of 
the variable” and associated with the names of E. A. Cornish and R. A. Fisher. 
The mapping is two-way and the main inversion formula determines co-efficients 
for one way from those for the other, both sets of coefficients being descriptive 
of their mappings More precisely if x is a given variable, g a Gaussian variable, 
y a parameter of the mapping, and the two mappings are 

oo 

x = g + 22 G n (g) y/n\ 

1 

oO 

g = x + 22 X n (x)y n /n\, 

i 

the formula expresses G n (x ) in terms of X,(x), i < n, and vice versa. 

The second formula is more particularly related to the Poisson summation 
and relates coefficients p n = p n (g) and q n = q n (g ) in the pair of equations 

oo 

a = q n c~* n /ni 

o 

06 

c = a p n oT in /n\ 

o 

Both formulas, which are necessarily elaborate, are given concise expression 
by the use of the multi-variable polynomials of E. T. Bell. 

2. Introduction. In 1923, in a paper little known in statistical circles, G. A. 
Campbell [2] gave as the basis for his extensive tabulation of the Poisson summa¬ 
tion an asymptotic series expressing the average a in terms of a normal variable 
g, corresponding to the probability of at least c occurrences, and c itself. That 
is to say, he associated with the Poisson summation 

00 

P(a,c ) = 22 e~ a a x /x 1 

e 

a normal variable g, defined by 

p{a • c) ■ £ 

417 


e~ xV2 dx 



418 


JOHN EIORDAN 


and inverted the summation (which, as is well known, is equivalent to the in¬ 
complete Gamma function ratio) to give a series for a in terms of g and c. The 
series, which is carried to 11 terms, starts as follows: 


a ~ c I 1 + gc~ Ui + 


17 2 — 1 „-i Q ~ 7 g -m 


36 


+ 


If x — (o — c) c 1/2 is introduced, this becomes 


and x is seen to be, like g, a standardized variable of mean 0, variance 1. 

It seems to have gone unnoticed that this result includes the x 2 distribution 
through the transformation: 2a = X ! , 2c = n and it has been rediscovered by 
A. M. Peiser [7] (4 terms) and by Goldberg and Levine [4] (6 terms). 

It is possible also to express c in terms of a and g, and a formula of this kind 
with fewer terms which appears in a footnote in Campbell’s paper is as fol¬ 
lows: 

„ fi „-i« i ff 5 + 2 _t , + 2g _3/ 2 "] 

c ~ a 1 - ga + —g— a + 7 ^~ - a + • ■ • 

Finally there is a third possibility of expressing g in terms of the remaining 
variables, preferably x and c; though unnoticed by Campbell this has since been 
brought to prominence by Cornish and Fisher [3], Hotelling and Frankel [5] 
and Kendall [6], 

The idea behind the first expansion appears most clearly in the second form 
and is that for c large the variable x behaves nearly like g. The third possibility 
reverses this expansion and gives a function of x and c which behaves like g\ 
hence if this function is first evaluated, reference to the normal integral table 
gives an immediate evaluation of the probabilities in question. Put in another 
way, the expansion widens the scope of the normal integral table and for this 
reason has been called “normalization” of the variable (but this term seems pre¬ 
empted by its use in another sense for orthogonal functions, and has been re¬ 
placed in the title by normal variable mapping). 

From the point of view of statistical theory, the three expressions are different 
versions of one relationship, which suggests that there should be general rules 
for transforming a series of one type into that of another. The two inversion 
formulas given below supply these rules in what appears to be as compact a form 
as the problem allows. It will be noted that the proofs given suppose convergent 
series, a case which leads to clarity and brevity and is interesting an itself. Ap¬ 
plied to Campbell’s series, they give the known results so far as the latter go, 
but of course for other asymptotic series they need independent verifications. 


3, First Inversion Formula. This relates coefficients in series like Campbell's 
first and its reverse as in Cornish and Fisher. More precisely 



INVERSION FORMULAS 


419 


If Gffg), Gi{g) •••are assigned -polynomials and if 

M 

(1) x = g 4 E G.(g)y n /n\, 

n—1 

defines x in terms of g and a parameter y, then 

(2) g = x + E Z.WM 

where 

(3) -X„(a:) = F n (a<?jOz), aG 2 (a:), , aG n (x )), 

TABLE 1 

Polynomials F„ (/f?i,/g 2 ■ • ■ /<7n) 

Fi = /iffi 
Fg = /ig 2 + f*g\ 

F 3 = / 1 J 3 + fi&diQi) + 

Ft = /i£t + fiifig^i 4 3 ^ 2 ) + /a(6(72(7 1 ) 4 

Ft = /ige + / 2 (5gh(7i + 1 0 ^ 3 ( 72 ) + /3 (lOfifaffi 4 15^ 2 gi) 

4/« (lpw'i) +/5ff? 

F« = /i<7t 4 /j(6wi + + 10^) 

4 /3(15<74<7i 4 t)Og 3 g 2 gi 4 15g 2 ) 

4 ft(20g 3 g\ 4 45(7^1) 4 /t(l% 29 i) + /e<7° 

F 7 = /i(7i + / 2 (7fftgi 4 21<76&2 4 35f/4(7a) 

4 f 3 (21g 6 gi 4 105gtg 2 gi 4 70g$gi 4 105(7 3 f/ 2 ) 

4 /i(35(74g? 4 210g 3 g 2 !7i 4 105gr 2 gi) 

4 /«(3W 4 4 /n(2W) 4 M 

Fb = /i<7b 4 / 2 ( 8 ff 7 ( 7 i 4 28g»g2 4 56gsg3 4 35gt) 

4 /3(28gtgi 4 168g 6 £ 2 0i 4 280g t g 3 gi 4 210g4? 2 4 280g 3 g 2 ) 

4 /t(56)75(71 4 420gT4g 2 gi 4 280gtgi 4 840gr 3 g 2 gi 4 105g 2 ) 

4 M70 fM fA 4 560(/5(/2f/i 4 420g!gi) 

4 fe(56g 3 gi 4 210g 2 gi) 4 fj(28g 3 g\) 4 /sffi 

F„ being the multivariable polynomial of E. T. Bell [1], in the variables Gi(x) to 
G n (x) and the symbolic variable a which is such that 

of 3 a { = (-DY~ l , D = d/dx, 

with differentiations on all products of Giix) to G n (x) associated with it in the poly¬ 
nomial. 

Note the symmetry of x and g, which allows the transformation to go either 
way, the inverse of (3) being 

(4) -G n (g) = Yn(aXi(g), aX 2 (g) ■■■ , aX n {g )) 

Table I gives explicit expressions for polynomials Y x to F 8 . It will be noted 
that the number of terms in F* is the number of partitions of n and that/,, the 



420 


JOHN RIORDAN 


variable replacing o, in the table, is associated with terms corresponding to 
partitions with i parts; that is to say, if Y n , t designates such terms 

Y n = 12f.Yn.i 

1 

The verification or extension of the table may be accomplished by the formulas 
and relations given by Bell (l.c.) or more directly by those modifications of Bell 
given by myself in [8]. 

The first few instances of (3), dropping the common variable x for brevity, 
may be read off from Table I (with appropriate changes of notation and inter¬ 
pretation of a,) as follows: 

-Xi - <?i 

-X t - Gt - D(Gl) 

-X* - Gz —3D(GiGi) + DKG\) 

-Xi - Gi -4 D(G&) - 3 D(Gl) + 6 D\GS\) - D\G\) 

Applied to Campbell’s first formula in its second form with y = c~ m and 

Gi(*) - (x 2 - l)/3, Gt{x) - (-fix* - 14x 2 + 32)/270, 

G 2 (x) = (x 3 - 7x)/18, Gi(x) = (9x‘ + 256x 3 - 433x)/1680, 

these show e.g. 

v a* - 7* 2(x* - 1) 2x _ -7x + x 

18 3 ’ 3 18 

and similarly for the others, resulting in 

Xl - -(x 2 - l)/3 

Xj = (7x 3 - x)/18 

Xi = — (219x 4 - 14x - 13)/270 

Xt = (3993x‘ - 152x 8 + 119x)/1680 

These determine a calculation formula for the Poisson summation, which is 
a refinement of the normal approximation. That is to say 

P(a,c) =4%) = jf dt 

with 

x a — 1 7x 3 — x 219x‘ l — 14x 2 — 13 

9 = x ~ 3Vc + 36c “ 1620CVC, 

3993x 6 - 152x s + 119x _ 

+ 40320c 2 


and x = (a - c)/Vc. 



INVERSION FORMULAS 


421 


For the tf-variate, the formula is applied in the reverse direction since Hotelling 
and Frankel supply the first four values of X n , that is, in present notation, the 
series 


g ~ x 


x 3 + x , 13x s + 8x a + 3x y 2 35x 7 4- 19x 6 + x 3 - 15® y 3 

V+ -48- 2 64-6 

6271a: 9 + 3224i 7 - 102a: 6 - 1680a: 3 - 945a: y 
+ 3840 24 + 


The reversed series (obtained by (4)) is 

, g 3 + g , ^ + 160 s + 3gy , 3 g 7 + 190 B + 17 g 3 - 15 g y 3 
x~9+ ~ T ~y+ -48- g + -64- 6 

, 790 s + 776p 7 + 14820 6 - 1920p 3 - 945 g y' , 

+ --- ... 

The first three terms are checked by Goldberg and Levine (l.c.). 

Another application worth noting is to the formulas of Cornish and Fisher 
which give G,{g) and X.(a;) in terms of the relative cumulants of the distribution; 
to save space these are omitted. 

The derivation of the formula may be indicated most easily by Lagrange’s 
formula for the expansion of one function in powers of another in the following 
form 1 : 

Let C be a contour in the complex z plane enclosing the point z = x, and let 
/(z) and <i>{z) be analytic on and inside C, Let y be such that | y<t>(z) \ < \ z — x \ 
when z is on C, and g be that root of the equation: 

(5) g = x + y<t>{g) 

which lies inside C. Then 

(6) f(g) = [ /O) T U°g [* ~ x ~ #(z)H dz = f{x) + X* n (x)y"/n\ 

n Jc cLz i 


where 

(7) Xt(x) = ^ If'ixmx))"] 

The contour integral in (6) appears, slightly disguised, as a problem in Whit¬ 
taker and Watson [Modem Analysts, Cambridge, 1920, p. 149]. The evaluation 
(7) is given for completeness, though no use is made of it in this section, the 
derivation proceeding directly from (6). 

First notice that by (1) and (5) 

— y<t>ig) = 11 G n {g)y n /n\, 
i 


1 The author owes the suggestion for this to S O Rice, who also simplified the derivation 
of the second inversion formula given later. 



422 


JOHN BIOHDAN 


so that the logarithm in (6) may be written 

eo 

log (z-s + E On(z)y n /n\), 

1 

or 

log (z - x) + log £l + E <?n(z)(z - aO'Y/nlJ, 


or 

(8) log (z — x) + log exp by, 
with b a symbolic variable such that 

5° m bo = 1 

1 ’^, = G n (z)(z - x)~\ 

Now if 

(9) log (exp by) = B<y + B % y/2 1 + >•> , 

= exp By, 

B being another symbolic variable, 5o = 0, B n m B n , it follows from equation 
(5) of [8] that 

= [D; log (exp , D„ = d/dy, 

= Y„(pbi , ftbi, - • • f3b„) 

= E P,Yn,>(h, bi, b n ), 

l 

with /?, = (—)’ _1 (t — 1)! and F„,< the part of polynomial Y n having i parts, 
as defined above. Moreover, each factor b k of terms in Y „,, contributes 
Gt(z)(z — x)~ l so that 

(11) B n = E ft(z ~ *)“' YMb), &(*) • • ■ <?,(*)) 

1 

Then, by (5) 

- a /. m (rh + a exp By ) * 


= /(a) ~ j c /'(z) exp 5y dz 


— x)‘ 


dz 



INVERSION FORMULAS 


423 


- ■«*> - ? a § l ■>' r “ (ftW ' ■' e - w)/,<2) “ 2 

= /(*) - £ £ E (--D) i - 1 [/'(x)F„,,(G 1 (*) ■ • ■ CM*))] 

i nl i 

with D = d/dx. The evaluation in the last line is by the Cauchy formula for 
derivates; the second line is derived by an integration by parts. 

Equation (4) follows from this and the substitution f(g) = g. 


4. Second Inversion Formula. This gives the interrelations of coefficients of 
series like the two Campbell series mentioned in the introduction. It runs as 
follows: 

If qi(g), qi(g) • • ■ ore given polynomials and if 


( 12 ) 


= cE 

0 


Qn(g)c 

n\ 




defines a in terms of g and a parameter c; then 

> p n (g)a~ in 


(13) 


= a E 


n! 


where 

(14) -Pn(g) = Y n (aqi(g), 092 ( 9 ), , aq n {g)) 

with a = ai = 1; a* = a, = (n — 4)(n — 6) • • ■ (n — 2i)2 ,+l 

Equation (14) is formally similar to (3) and by symmetry as before, q n (g) is 
readily expressible as a F„ polynomial in pi(g) to p n (.g)-. 

The first five instances of (14), dropping the argument for brevity, are 


-Pi = 9i 

— pi = 92 — 9 i 

—Pi = 9 s — $9291 + t 9 i 


-Pi = 9* 

—pt = q 6 + 4 (9*91 + 29392) - t (2939? + 39291) 

+ ^ 929? - tt 9* 

Applied to Campbell’s first series where 

91 ( 9 ) = 9 93(9) = (9 J - 7ff)/6 

92 ( 9 ) = 1 (9 2 - 1) 9*(9) = (- 129 * - 28 g 2 + 64)/135 

93(9) = (3 69 * + 10249 s - 17329)/1296 



424 


JOHN RIORDAN 


these show that 

Pi(g) = -g p»(p) - (g a + 2g)/i2 

Mg) = (g 2 + 2)/3 p<(g) = (12 g* + 28 g 2 - 64)/135 

pt(g ) = (207s 7 s + 2596 g l - 6148ff)/1296 

The proof of (14) is as follows. First, for brevity introduce symbolic variables 
p and q with the usual interpretation p” s p H (g), q " « q n (g) so that ( 12 ) and 
(13) read 

a = c exp q c -i 
c = a exp p a ” 1 

Now write a = 1/a; 2 , c = 1/y 2 changing these to 
X ~ y (exp qy) _i 
y = x (exp pa ;) -1 

and note that 

(15) xV ~ 2 = (exp qyM = exp px 

which shows that p n is the coefficient of x'/n\ in the expansion in powers of x 
of (exp qyT 1 . Lagrange’s formula gives at once (D -= d/dy): 

(16) M = D n ~\f'{g )(exp qy)%^ 

so that 

(exp qy)~' = £ ^ 2> n_1 [-(exp qy) Un ~ () D(ex p gy)] v - 0 

or 

(!7) -Pn = ( Z> "( eX P 

= Y n (ctqi, aq 3 , ••• ,aq n ) 
with a, as in (14), by equation (5) of [Sj. 

BIBLIOGRAPHY 

[1] Bell, E T,, “Exponential polynomials,” Annals of Math. Vol. S5, (1934), pp. 258-277. 

[2] Campbell, G. A., “Probability curves showing Poisson’s exponential summation,” 



INVERSION FORMULAS 


425 


Bell System Tech Jl. Vol 2, (1923), pp. 95-113; Collected Papers, New York, 
1937, pp 234-242. 

[3] Cornish, E. A and Fisher, It A., “Moments and eumulants in the specification of dis¬ 

tributions, 1 ’ Revue de I’Insiltut Intern de Stat Vol. 4, (1937), pp. 1-14. 

[4] Goldberg, H and Levine, H.,, “Approximate formulas for the percentage points and 

normalization of t and xV’ Annals of Math Slat Vol 17, (1946), pp. 216-225 

[5] Hotelling, H, and Frankel, L R , “The transformation of statistics to simplify their 

distribution,” Annals of Math Slat. Vol 9, (1938), pp 87-96. 

[6] Kendall, M G., The Advanced Theory of Statistics I, London. 1943. 

[7] Peiser, A. M, “Asymptotic formulas for significance levels of certain distributions,” 

Annals of Math Stat Vol 14, (1943), pp 56-62 

[8] Riordan, John, “Derivatives of Composite Functions,” Bull. Am. Math. Soc. Vol 58, 

(1946), pp 664-667. 



ON THE DETERMINATION OF OPTIMUM PROBABILITIES 

IN SAMPLING 


By Morris H, Hansen and William N. Horwitz 
Bureau of the Census 


1. Summary. In a previous paper [2] it was shown that it is sometimes 
profitable to select sampling units with probability proportionate to size of the 
uni t. This note indicates a method of determining the probabilities of selection 
which minimize the variance of the sample estunato at a fixed cost. Some ap¬ 
proximations that have practical applications are given. 


2. Introduction. Neyman has shown that it is possible to reduce the sampling 
variance of an estimate by dividing a population into sub-populations (called 
strata) and varying the proportions of units included in the sample from stratum 
to stratum [1], His treatment presumed that the units within any stratum would 
be drawn with equal probability. In many practical sampling problems, the use 
of constant probabilities is neither necessary nor desirable. Not only is it possible 
to obtain unbiased or consistent estimates with varying probabilities of selection 
of the sampling units, but also it is possible to reduce the variance of sample 
estimates by appropriate use of this device. 

It has been shown [2] that in a subsampling system, the selection of primary 
units with probabilities proportionate to the number of elements included in the 
primary unit may bring about marked reductions in sampling variances over 
sampling with equal probabilities. In this note, wo shall indicate a method of 
determining the optimum probabilities under certain conditions, and also some 
approximations to the optima that have practical applications. 

By optimum probabilities, we mean the set of probabilities of selection that 
will minimize the variance for a fixed cost of obtaining sample results, or alterna¬ 
tively that will minimize the cost for a fixed sampling'error. 


3. Optimum probability with a subsampling system. Consider, for example, 
the simple subsampling system where primary units are first drawn for inclusion 
in the sample and then a sample of elements is drawn from the selected primary 
units. We shall suppose, for simplicity of notation, that the sampling is done with¬ 
out stratification, The conclusions indicated below will bo similar if stratified 
sampling is used, and they will hold even if only one unit is drawn from each 
stratum. Suppose that a population contains M primary units, and that the 
sampling of primary units is to be done with replacement. Sampling with re¬ 
placement is assumed in order to simplify the mathematics. We wish to estimate 
the ratio 


X 

Y 


M Hi 

EXX 

(-1 f -1 

M N, 


t-i i-i 



OPTIMUM PROBABILITIES IN SAMPLING 


427 


where X,j and Yu are the values of two characteristics of the jth. element within 
the ith primary unit, and is the number of elements in the ith primary unit. 
A consistent estimate of X/Y is given by 


(1) 


_ *™i v j—i 

i—l ± x 7li j—i 


where 

P, = The probability of selecting the ith primary unit on a single draw 

m, = The total number of elements included in the sample from the ith unit 
if it is drawn. If a particular unit happens to be included in the sample 
more than once the subsampling will be independently carried through 
each time at is drawn. 

m = The total number of primary units included in the sample. 

It will be assumed that a self-weighting sample is to be used, i.e., that although 
the probabilities of selecting primary units will vary, the subsampling rate 


7 X Tl 

within the ith selected primary unit, , will be such that P, = k. Note that, 


with this condition, k is the probability that an element will be included in the 
sample by making a single draw of a primary unit, and by carrying out the speci¬ 
fied subsampling within the selected primary unit. It follows that mkN is the 
expected total number of elements included in a sample of primary units, where 


N = £ N,. 

i-i 


The method can be extended to cover situations where other conditions are im¬ 
posed. 

We shall express the variance of r in terms of P,, m, and /c, and also express 
the cost in terms of these same quantities. The optimum values of P,, m, and k 
will then be determined 

The variance of the sample estimate. To terms of order 1/m of the Taylor ex¬ 
pansion of a ratio, the sampling variance of the estimate (1) is approximately 


( 2 ) 


At , v' jv; w. - n, , 

p- h, + 2 -, p- • „ 

t — 1 i I l™I -t t iV % 





X, 


N, 

d_ 

A, 


where 




The cost function. Now suppose that the total cost of the sampling procedure 
involves a fixed cost attached to each primary unit included in the sample, a 
cost of listing the elements within each selected primary unit (this listing may be 
necessary in order to draw a subsample), and a cost of obtaining information from 
each of the elements selected for inclusion in the sample. Under these circum¬ 
stances the total expected cost of the survey will be: 

(3) C = Cjin + C,m £ PiN, + C,mkN 

i-i 


where 

The fixed cost per primary unit, 

The cost of listing one element in a selected primary unit and other 
costs that vary with the number of elements to be listed, 

The cost of obtaining the required information from one element in 
the sample, 

Expected number of elements in the sample per primary unit in 
the sample, 

The over-all sampling ratio, and 
w 

2 Ni = The total number of elements in the population. 

It will be noted that although the values of P< and m may be fixed in advance, 

ftt 

the number of elements to be listed, £ N t , remains a chance variable. It is for 

1“1 

this reason that we consider the expected cost rather than the actual cost. 

The optimum values of P,, m, and k. The values of P,, m, and k which min¬ 
imize the variance (2) subject to the conditions that: 

£ P* = t P< = 1 , 

IS* i-i 


Ci = 

c 2 = 
c 3 = 

£ pjv, = 
1-1 

mh = 
N = 


C is fixed, 



OPTIMUM PROBABILITIES IN SAMPLING 


429 


are given by 


(4) 


(5) 


( 6 ) 

where 


P, = 




nU. 


C\ + C 2 W, 






+ C 2 Ni 


E N t «i 

1-1 


h = 


N 


S/cT 


Nik 


+ CtN. 

c 


Cx + C 2 E P.tf, + C 3 fciV 
1-1 




= A 2 . 


JV,' 


Ordinarily 8, will be positive although it will often be found to be negative for 
some i. For a great many populations, such negative values can be avoided by 
classifying the primary units into size groups or other significant groups and then 
requiring that the probability of selection be P a for every primary unit in the 
a-th group. 

In actual practice, however, in advance of designing a sample one does not 
have the data to compute the optima and uses methods of approximating the 
optimum probabilities. Methods of approximating the optimum probabilities are 
given below. 

4. Some rules for approximating the optimum probabilities. In another 
paper [2] considerations were presented from which it follows that 5< tends to 
decrease with increasing size of unit, but seldom as fast as the size of unit in¬ 
creases. The rate of decrease is often small relative to the increase in Ni , and 
empirical data for a number of problems indicate that even the assumption of 
8 , being fairly constant with increasing size of unit may not lead one far astray 
from the optimum probabilities. Under this assumption (8, = 8 for all i) the 
probabilities depend only on JV<, Ci, and C 2 , and lead to the following results: 

(a) When Ci > 0 and C 2 = 0, probability proportionate to size will be the 
optimum. 

(b) When Ci = 0 and C 2 > 0, probability proportionate to the square root 
of the size will be the optimum. 



430 


MORRIS H. HANSEN AND WILLIAM N. HURWITZ 


If we go to the other extreme (extreme not in terms of mathematically possible 
values but in terms of most practical populations), and assume that 5, decreases 
at the same rate that Ni increases, the results would be - 

(a) When ft > 0 and ft = 0, probability proportionate to the square root 
of the size will be the optimum. 

(b) When ft = 0 and ft > 0, equal probability will be the optimum. 

The minimum is broad in the neighborhood of the optimum and the results for 
either of these extremes and the values in between often will give results reason¬ 
ably close to the minimum. This leads to the following useful approximations: 

(a) When ft TP t N t , the expected cost per primary unit of listing and related 
operations, is small in relation to ft , the fixed cost per primary unit, the 
optimum probabilities will be between probability proportionate to size 
and probability proportionate to the square root of size, and either of these 
will be reasonably close to the optimum. 

(b) When ft is small compared to ftSP.Ah, the optimum probability will be 
between equal probability and probability proportionate to the square root 
of size, and either of these will be reasonably close, to the optimum. 

(c) When both ft and ftSP,iVi are of significant size, i e., when the costs 
vary substantially both with the number of primary units in the sample 
and the size of the units, then probability proportionate to the square root 
of the size will bo a reasonably good approximation to the optimum. 

(d) When units of small size are used and all of the subunits in the selected 
primary units are included in the sample (that is, there is no subsampling) 
equal probability is close to the optimum. It should be noted that this 
rule does not follow directly from the above analysis based on subsampling, 
but from a separate analysis in which no subsampling is involved. 

For whatever system of probabilities is used, and with the cost function given 
by (3), the optimum value of k is given by: 



which can be approximated, in application, from prior experience or preliminary 
studies. The corresponding optimum value for m is obtained by substitution in 
the cost function 

The above results should not be accepted, of course, as the optima for every 
cost function or every sampling system. Either past experimental data may be 
available or pilot tests made to determine the cost function and the appropriate 
approximations that should be used in various practical situations. 

An illustration An illustration may be of interest. A characteristic pub¬ 
lished for city blocks in the 1940 Census of Housing is the number of dwelling 
units that are in need of major repairs or that lack a private bath. Suppose we 



OPTIMUM PROBABILITIES IN SAMPLING 


431 


were sampling to estimate the proportion of the dwelling units having this char¬ 
acteristic for the Bronx in New York City, at the time of the 1940 Census. Let 
us assume that once we selected a system of probabilities we used the optimum 
numbers of blocks and the optimum sampling ratios appropriate to these proba¬ 
bilities, that is, the optimum values of k and m. For each of several cost func¬ 
tions the following Table 1 shows the sampling variances of each system, rela- 


TABLE 1 


Unit costs 

Average cost per pnmary unit of 
listing and related operations 
(C&P,N,) 

Variances relative to equal 
probability 

C, 

C 2 

Ct 

Equal 

proba¬ 

bility 

Probability 
propor¬ 
tionate 
to square 
root of size 

Proba¬ 
bility 
propor¬ 
tionate 
to size 

Equal 

proba¬ 

bility 

Probability 
propor¬ 
tionate 
to square 
root of size 

Proba¬ 
bility 
propor¬ 
tionate 
to size 

5 

.10 

1 

13.49 

21.15 

27.63 

100 

92 

104 

5 

.05 

1 

6.75 


13.82 

100 

88 

97 

5 

.02 

1 


4.23 

5 53 

100 

83 

87 

5 

0 

1 




100 

75 

73 

2 

.10 

1 

13.49 

21.15 

27.63 

100 

96 

111 

2 

.05 

1 

6.75 


13.82 

100 

93 

106 

2 

.02 

1 


4.23 

5 53 

100 


97 

2 

0 

1 


■ . 


100 

79 

77 

1 


1 

13.49 

21.15 

27.63 

100 

97 

114 

1 


1 

6.75 


13 82 

100 

96 

110 

1 


1 


4 23 

5 53 

100 

93 

103 

1 


1 

mn 



100 

82 

81 


.10 

1 

13.49 

21.15 

27.63 

100 

99 

117 


m 

1 

6.75 


13 82 

100 

99 

115 


m 

1 


4 23 

5.53 

100 

99 

113 


tive to the variance of sampling with equal probability. It also shows values of 
CiSPiNi for comparison with C\. 

Some of the costs given in the table do not have unreasonable relationships 
in terms of the situations encountered in practice in various types of jobs. The 
comparisons are not affected by the absolute magnitudes of the costs but only 
by their relative magnitudes The results are consistent with the rough rules of 
thumb given above. It is worth noting that in each of the above instances prob¬ 
ability proportionate to the square root of the size yields a comparatively low 
variance 
















432 


MORRIS H, HANSEN AND WILLIAM N. HURWITZ 


5, Sampling with or without replacement, In this paper the sampling with 
varying probabilities was assumed to be carried out with replacement which 
ordinarily would not be advisable in practice. When sampling is done without 
replacement the optimum probabilities and their approximations will be about 
the same as for sampling with replacement in at least those instances where the 
proportion of the population in the sample is small. Further investigation is 
needed for large sampling rates, 

6. Conclusion. In summary, it is not essential and may not be desirable to 
give each element in the population (or stratum) the same chance of being drawn 
m order to avoid bias or to have a consistent estimate. Estimate (1) is a con¬ 
sistent estimate no matter what probabilities of selection are assigned to these 
units, The use of variable probabilities of selection is another device to be added 
to those already m the literature, such as stratification and efficient methods of 
estimation, which make it possible to achieve the objectives of a sample survey 
at reduced costs. Reference [2] gives another illustration of reductions in sampling 
variance achieved through the use of varying probabilities in accordance with 
the rules suggested above for approximating the optimum probabilities, 


[1] ter Nkvman, "On the two different aspects of the representative method of purposive 

selection," Boy, Sfaf. Sob Jour,, New Series, Vol, 97 (1934), pp, 558-606, 

[2] Mourns H, Hansen and William N, Hurwitz, "On the theory of sampling from finite 

populations," Annala of Malh, Slat,, Vol. 14 (1943), pp, 333-362 



A SOLUTION TO THE PROBLEM OF OPTIMUM CLASSIFICATION 

By P. G. Hoel and R. P. Peterson 
University of California, Los Angeles 

1. Summary. By means of a general theorem, the space of the variables of 
classification is separated into population regions such that the probability 
of a correct classification is maximized. The theorem holds for any number of 
populations and variables but requires a knowledge of population parameters 
and probabilities. A second theorem yields a large sample criterion for deter¬ 
mining an optimum set of estimates for the unknown parameters. The two 
theorems combine to yield a large sample solution to the problem of how best to 
discriminate between two or more populations. 

2. Introduction. There are essentially two basic problems m discriminant 
analysis. The first problem is to test whether the populations differ, since it 
would be futile to attempt a classification if the populations did not differ. The 
second problem is to find an efficient method for classifying individuals into their 
proper populations. In this paper, an optimum asymptotic solution of the 
second problem will be presented. 

3. Parameters known. Let f, = f,(x i ,•••,**), (i = 1, ••,»■) denote the 
probability density function of population i in the region under consideration. 
Let p, > 0, (i = 1, • ■ • , r), denote the probability that population i will be 
sampled if a single individual is selected at random from that region, and let R 
denote the k dimensional,Euclidean variable space Then the desired theorem 
is the following: 

Theorem 1. If M, denotes the region in R where p,/, > p,f,, (j = 1, ■ • ■ , r), 
and where p,/, > 0, then the set of regions Mi, (i = 1, ■•■,?■), in which any 
overlap is assigned to the M, with the smallest index, will maximize the probability 
of a correct classification. 

For the purpose of proving this theorem, consider any other set of non- 
overlapping regions, Ml . Since the addition to any of the regions iff, of a part 
of R throughout which all the functions /, vanish will not affect the probability 
of a correct classification, there is no loss of generality in assuming that the set of 
regions Ml contains the same portion of R as the set of regions M, does. The rela¬ 
tionship between the two sets may be expressed by means of the formulas 

(1) Mi = t, M v 

2-1 

and 

(2) M',= £m, } , 

where M,j denotes that part of M, which is contained in M\. 

433 



434 


P. G. HOEL AND H. P. PETERSON 


Since a sample point that falls in the region Af, will he judged to have come 
from population i, the probability of the correct classification of a single random 
sample by means of the set ikf, is given by 

(3) Q = Pi f fidE+ ... + Vr f f r dE, 

Jm i Ji/ r 

where dE - dx\dxi • ■ ■ dx r .If Q' denotes the probability of the correct classifica¬ 
tion by means of the set M[, 

Q' = Pi [ fldE+---+ Vr [ frdE. 

JAfJ <1 ,V', 

In the notation of (1) and (2), these probabilities become 

Q = Pi f Si dE + ■ • • + p r f ft dE 

•ISWl, JXM r j 

and 

Q' = Pi f fi dE +■ * ‘ * + Pr f ft dE. 

JXiUi JZMlr 

Now consider the difference Q — Q'. It can bo expressed in the form 
Q ~ V = £ i, [*P. f fidE-pJ fs dE 

t-i y-i L Jmii 

= Z) X) f Ipifi ~ Pifjl dE. 

l-l ;-l j 

Since M,, is contained in Ilf, and p,/, > (j = 1, • • • , r), holds throughout 
Ilf,, it follows that each of these integrals is non-negative, consequently Q > Q', 
which proves the theorem. 

This theorem yields a solution to the classification problem only when the /, 
are completely specified and the pi are known. 

It will be observed that this theorem is similar to a generalization of a funda¬ 
mental lemma in the Neyman-Pearson theory of testing hypotheses [1], and to a 
result by Welch [2], 

If the basic weight function in Wald’s [3] formulation of the multiple decision 
problem assumes only the values 0 and 1, corresponding to whether or not a 
correct classification is made, it will be found that the set of regions Mi will 
minimize the expected value of the loss in that formulation. 

4, Parameters unknown. Since the p t , as well as the parameters in the /,, 
are assumed to be unknown, Q will be a function of such parameters. Let (fi, • • • , 
6, denote all such parameters, including the p,. Now let a random sample of size n 
be taken from the region under consideration and let 6i 6, denote a set of 



OPTIMUM CLASSIFICATION 


435 


estimates of the parameters based on this sample. Since the total sample will 
constitute a sample of size Mi from/i, n 2 from/ 2 , etc , where n = rii + • • • + n r , 
the 0’s for/, will be estimated by means of a sample of size n, rather than of size n. 
In the following arguments, it will not be necessary to distinguish between 0’s 
which are estimated by different size samples because the arguments will be 
based on the order of terms with respect to the size sample and n, ~ np, with 
probability one. Or, more simply, choose all n, equal. 

Let M, correspond to M, when the parameters are replaced by their sample 
estimates and let Q denote the probability of a correct classification when using 
the regions M, in place of the regions M 1 . Then, from (3), 

Q - Q=t,pJ[ /, dE — f 

Let H — Q — Q Since the estimates, 0,, are random variables, H will be a 
random variable which is a function of the estimation functions, 0,, as well as 
of the parameters, 0,. The desired criterion for determining optimum estimates 
is then given by the following theorem: 

Theorem 2. If E(6 t — 0,) 1 = 0(n~ a ), g > 0, and if in some neighborhood of the 
point 0, = 0,, (i = 1, ■ , s) the function H is continuous and possesses continuous 

derivatives of the first, second, and third order with respect to the 0,, then 

E(H ) = l E £ H v E(fi t - 0,)(0, - 0,) + 0{n M °), 

where II tJ denotes the partial derivative of II with respect to 6, and 0, at the point 
(0i , ■ ■ • , A.) 

The proof is similar to the type of proof used by Cramer [4] to obtain an 
expression for the variance of a function of central moments. 

By means of Tchebycheff’s inequality [4], page 182, it follows that 

pm - e.) 1 > * 4 3 < E{ - ^ 9 ' }i - 

€ 4 

From the theorem assumptions, there exists a constant A such that 

P[6i ~ 0i) 4 > e 4 ] < 

c* 


This is equivalent to 

ArT 1 

P[|0.-0,| > e] 


If Ei denotes the set of points in sample space where 10 , — 0 , | < c, (i — 1 , • • • , s), 
and Ei denotes the complementary set, this inequality implies that 


P[E 2 ] < 



(4) 



436 


P. G. HOEL AND R. P. PETERSON 


The expected value of H may be written in the form 

(5) E(H) = [ HdP+ [ HdP, 

Consider the order of the second integral. From (4) and the fact that H is the 
difference of two probabilities, it follows that 

I / H dP < [ dP = Pm < S -^-. 

Consequently (5) becomes 

(6) E(H) = f HdP + 0(ri~°), 

Now consider the first integral. From the theorem assumptions, if e is chosen 
sufficiently small, it follows that for any point in the set E x , the function H 
can be expanded in the form 

h = H(e) + t,(h- + e t )(h - e,)HM + r, 

3 *11 

where 9 denotes the point (0i, • • • , 6 ,), where 

R = l±±± (8, - 8<)(8j - 0/)(5* - 8 k )H ijk (d'), 

Dili 

and where 8' is some point in E x . Since Q reduces to Q when 8 = 8, H(6) — 0. 
Furthermore, since Q denotes the maximum probability of a correct classification, 
H >. 0 for all 0; hence £1,(0) = 0 and H u {d) > 0 for all i. Thus, for any point 
in the set E x , 

= 8,)(l - 0,)HM + R, 

If this expression is substituted in (6), E(H) will become 

(7) E(H) -!££ IEM f (5, - 8M - o } ) dP + [ r dP + o{n°). 

A 1 1 Js i Js t 

Consider, first, the order of the remainder term. From the continuity assump¬ 
tion on H,j k , it follows that £/<# is bounded in Ei , say | H ijk (e') \ < B\ hence 

\f («<- flO Qi - ei) (h - e k )H iik {8') dP <B f \ ($ { ~ 8 ,) (h - ei) ( 0 * - e k ) \ dP, 

\ J Bi Jki 

By Schwarz’s inequality, 
f | (h - 8 t )(h - 0y)(0* - 8 k ) | dP 

- [/* ^ ~ 9 ^ ( ' 6> ~ d, y dp L ^ ~ 6k)2 dP ] ■ 



OPTIMUM CLASSIFICATION 


437 


Similarly, 



Since 



0,) 4 dP < 


I. 


X1+X1 


(8. - e t y dP = 0(n°), 


the preceding inequalities combine to give 
(8) I f R dP 

I s l 

Now consider the first integral in (7). It may be written in the form 


= 0{rr m °). 


(9) f (8, - di)(e 3 - e,) dP = E(e t - e,)(e, -o,)- f (§. - e,)(e, - e,) dP. 

JE i 

By Schwarz’s inequality, 

If (fit- e,)(fi, - o,) dP | < r f (8, - o,) 2 dP [ (8j - d,y dpi 1 . 

Similarly, 

J M (fit - e x y dP < (8, - e,y dP-PlEA J. 

If these inequalities are combined and inequality (4) is employed, (9) will 
reduce to 


(10) f (8, - 8,)(8 3 - 8j) dP = E0 t - e,)(0, - 8,) + 0(n~°). 

Je i 

Finally, if (8) and (10) are employed in (7), it will reduce to the result stated 
in the theorem. 

The order of the leading term in E(H) depends upon the nature of the esti¬ 
mating functions, 8i . In order to insure that this term will be the dominating 
term, and thus rule out pathological situations, only that class of estimating 
functions (estimators) will be considered for which this term will be of lower 
order than that of the remainder term. If the estimators are means or central 
moments, for example, then g = 2. For such estimators the order of the remainder 
term is 0(n -1 ), whereas the order of the leading term is not higher than 0(n~ l ). 

A set of estimators will be called an optimum set if it maximizes the expected 
value of the probability of a correct classification, or, what is equivalent, if it 
minimizes E(H). Since only large samples are being considered here, it is neces- 



438 


P. a. IIOEL AND It. P. PETERSON 


sary to define optimum in an asymptotic sense. Consider sets of estimators for 
which E{H) is of order 0(n“ ff ). For this class of estimators, a net will be called 
asymptotically optimum if it minimizes 

lim n*B{H) 

Among asymptotically optimum sets of various orders, the set corresponding 
to the highest order would naturally be considered as the best asymptotic set. 
Now from Theorem 2, it readily follows that a set of estimators which minimizes 

(a) itn^Ed-0)^-0,) 

i i 

will be an asymptotically optimum set. 

6. Maximum likelihood estimates. If the estimates h are unbiased and uncor¬ 
related, (11) will reduce to 

da in,,,] 

i 

where a\ => E(6, - 0,) ! is a function of it as well as of the parameters. Since, from 
the discussion preceding (7), H„ > 0, it follows that (12) will be a minimum when 
the o-v assume their minimum values. Now it is known. [4], page 504, that under 
mild restrictions maximum likelihood estimates possess minimum asymptotic 
variances; hence for estimators of the type being considered which also satisfy the 
conditions in [4], the maximum likelihood estimates of the Si will yield an 
asymptotically optimum set of estimates for the classification problem. 

KKFKRKNCK8 

[1] J. Ndyman and E 8. Pearson, “On the problem o[ the most efficient tests of statistical 

hypotheses,” Roy. Sac. Phil. Trans., Vol, 231 (1933), pp, 289-337. 

[2] B, L, Welch, “Note on discriminant functions," Biovietnka, Vol, 31 (1939), pp. 218-220. 
[3| A, Wald, “Contributions to the theory of statistical estimation and testing hypothe¬ 
ses,” Annals oj Math StaL., Vol. 10 (1939), pp. 299-304, 

[4] H CRAidiR, Mathematical Methods of Statistics, Princeton University Press, 1940, 
pp. 352-356. 



NOTES 

This section is devoted to brief research and expository articles on methology and 
other short items. 

A GENERALIZATION OF WALD’S FUNDAMENTAL IDENTITY 

By Gunnar Blom 
University of Stockholm 

1. Summary. The fundamental identity is generalized to the case of independent 
random variables with non-identical distributions The conditions for the 
validity of the differentiation of the identity are discussed. The results given in 
[1], [2], and [3] arc obtained as special cases. 

2. A property of cumulative sums. Let Zi, z 2 , * ■ ■ be an infinite sequence of 
independent random variables, Fi(z), F 2 (z), • ■ • their distribution functions (d.f.) 
and tpi(t), <p 2 (t), • • their moment-generating functions so that <pft) = E(e tCy ). 
an and b N are given constants (a N > b N , N = 1, 2, •••). n is defined as the 
smallest integer N for which Z w = z t + • ■ ■ + z H is § a N or g b s . 

We first give two lemmas. 

Lemma 1. If two positive quantities 8 and e can be found such that one at least 
of the following conditions a ) and b) are satisfied 

a) P(z y > 5) > € for all v and hm sup a N < <*> 

N-+X 

b) P(z, < —5) > e for all v and tim inf b N > — 

JV-toO 

then for any k ^ 0 

(1) lira N k P{n > N) = 0. 

iV->oo 

An inspection of the proof of (4) in [4] shows that this formula holds when the 
conditions of the lemma are satisfied. The lemma follows. 

Lemma 1 can be generalized as follows. 

Lemma 2. If two positive quantities 8 and t and a sequence ci, c 2 , • • ■ can be 
found such that one at least of the following conditions a) and b) are satisfied 

N 

a) P{z* + c, > S) > t for all v, lim sup a N < °o, lim sup 2] c, < oo, 

jy-»« A?-*» 1 

b) P{zy + c, < —5) > efor all v, 

hm inf bn > — », lim inf 23 > — 05 , 

.N-*oo J^-*oo 1 


then (1) is true. 


439 



440 


GUNNAR 13LOM 


Proof: In case a) we put z, = z, + c,, Z' N = 2 z, and a# = a N + Ef c ,. 
The inequality Z N £ a. v then becomes £ ai . As P(zi > S) > t and lim sup 

a' N < °°, Lemma 1 can be applied to the sequence z[, Zj, • ■ • , and thus (1) is 
true. When conditions b) are satisfied, the proof is analogous. 

3. The generalized fundamental identity. In this section we shall consider 
sequences of random variables of the type defined in Lemma 2, We shall prove 
two theorems the first of which is valid for complex values of t and the second 
only for real values of t. 

Theorem 1. Assuming that 

1°, one at least of conditions a) and b ) of Lemma 2 is satisfied ; 

2°, b ^ b N < ajv S a, where a and b are finite, 

3°. for some complex (or real ) value of t, <p,(t) exists for all v and is 0 and 
lim inf | <pi(t)-- ><p N (t) | > 0, 

to 

then 

(2) Ele^ft) - 1. 

Proof. Let W m denote the set of all sequences zi • • Zn in the ^-dimensional 
Euclidean space ft* for which n = m (m g N), w' m the projection of W m on Sfl m 
and W n>N all sequences for which n > N. We have identically 

\t ( + ( 1 e l2N dFi • dF N = f e ,ZN dF x * - • dF N = Vl (t ) • • • *,(<). 

Lm-l JO# 

Dividing by the right member and cancelling common factors we obtain 

f , « ttm dF,--- dF m 

(3) m "’ Jr " 

+ (<P 1 • *• f e Uff dFx--- dFy = 1. 

When N —> « the first sum tends to the left member of (2). We thus have to 
investigate the last term in (3) which we denote by R N . We can write 

Rn - (vi • • • f e ,2lf dF 1 • ■ • dF H 

= > N)E n> „e ,t ". 

It follows from Lemma 2 that P(n > N) 0. Mb < < a by 2 6 we conclude 

that Ejv —> 0. This proves the theorem. 

Theorem 2. I/, for some real value of l, <p,(t) exists for all v and if quantities 
c,, t > 0 and 6 > 0 can be found such that at least one of the following conditions 
a) and V) are satisfied for all v 

N 

a) lim sup o*r < », lim sup 22 e, < » and 

N—x, JV-»oo 1 

Aft, S) = -L e “ dFfz) > «, 

<py[t) Jl~c, 


(5a) 


(v = 1, 2, •), 



wald’s fundamental identity 


441 


rf 

b ) lim inf b N > — , lim inf ^ c, > — « and 

jV-*oo JV —»oo 1 

(5b) ' a) = jf J e fI dF.(*) >e, (v = 1, 2, • • •), 

then (2) ItoZds. 

The conditions of the theorem become more attractive if the theorem is 
limited to the somewhat less general cases mentioned in the Corollary below. 
The above formulation has been chosen mainly because of an important applica¬ 
tion to identical variables in Sec. 6. 

Proof. The theorem is proved if we can show that Ru in (4) tends to zero when 
N —> oo . For that purpose we use the transformation (cf [5] and [3]) 

(6) Q,(*i t) = -L f «** dF t {z), (v = 1, 2, ■ ■ ■)• 

G,(z; t ) is obviously a d f. for every real t (for which <p,(t) exists). When (5a) 
holds, 

P[z, + c„> S\ G,(z) 0] = A(t, S ). 

Here the expression m the left member denotes the probability that z, + c v > 5, 
when G v is the d.f. of z.. 

Consequently, when conditions a) are fulfilled, a sequence of random variables 
with the d.f :s <n(z; <), <? 2 (z; t), ■ • • or, with one notation, G{t) satisfies the con¬ 
ditions a) of Lemma 2. It follows that 

lim P(» > N | <?(0) = 0. 

AT-*oo 

Introducing (?,(z; t ) in R N we find 

Rn = f dGx • • • dG N = P(n > N\ Git)). 

Consequently R N —> 0. When conditions b) are fulfilled, the proof is analogous. 

Corollary to Theorem 2. If 1° <p,it)e‘°’ g H(t) < °o, 2° t is •positive and 
conditions a) of Lemma 2 hold or t is negative and conditions b) of Lemma 2 hold, 
then the generalized fundamental identity is true. 

For, in the first case 

^mL, dF ^m = tS) 

so that (5a) is satisfied, and similarly when t is negative. 

The following special case deserves particular attention aB it covers most 
cases occurring in practice and the conditions become very simple: If a sequence 
of random variables satisfies conditions a) and b) of Lemma 1 simultaneously, a 
sufficient condition for the validity of (2) for some given real value of t is that the 
sequence <p»it) is bounded. 



442 


GUNNAR ELOM 


4. Application to Poisson variables. As an application of (2) we consider a 
sequence of Poisson variables with the parameters Xm,, where X is a positive 
quantity and m, arc positive integers. From the well-known formula 

*,,(0 = 


we easily conclude that the conditions of Theorem 1 are valid if R(e ( ) 5: 1. (With 
Klin (5a) we find that (2) holds even for negative t .) If, in particular, we 

choose t so that e‘ = 1 + = c*, we have the simple formula 

E(d") = 1 , (*= 1 , 2 , • •)■ 


6 . Differentiation of the generalized fundamental identity. In this section i 
is assumed to be real, We denote the /cth derivative of <p,(t) by i/>i K) (t). We shall 
prove the following theorem which corresponds to Theorems 1 and 2. 
Theorem 3. If for all l in a closed, interval I the conditions stated in Theorems 


1 or 2 are satisfied and if, in addition, the functions 


arc uniformly hounded 


v ^(0 

<pft) 

with respect to both v and t (in I) for li = 1, 2, ■ • • r, then the generalized funda¬ 
mental identity may he differentiated r times with respect to tfor any t in the interior 
ofl 

We use a method of proof which is similar to that used in [2]. We first show 
that the sum in (3) may be differentiated r times under the integral signs and 
secondly that the rth derivative of Ru tends to zero uniformly in t when N —> co. 

The rth derivative of the general term of the series in (3) consists of a finite 
number of terms of the form 


Jm(t) = (<pi • • ■ <p m ) f , Zt, e 12 ” 1 dF i • • • dF m (y g X; p, X = 1, 2, * ■ • r), 


and the rth derivative of Rx in (4) consists of a finite number (which does not 
depend on N ) of similar expressions with N substituted for m and W n >s for 
W m . Hu is a sum of m" and If terms respectively which is symmetric in v. 

(lc g X; v = 1, 2, ■ ■ • m) and are thus major- 




The terms are functions of ... 
ated by the same constant C. 

Further, we can always find a positive quantity U such that for all t in I 


Z\e n " | ^ e <ol2ml g (e hZn + e - ' 02 "). 




Hence 


(7) I J m (t) I g (¥>!•■ • v m y l Cm 1 ‘ [ , (e f ° 2m + e-‘° Zm ) dF,--- dF m 

J ir„ 

The rest of the proof is divided into two parts corresponding to the conditions 
of Theorem 1 and those of Theorem 2. 



WALD’S FUNDAMENTAL IDENTITY 


443 


When the conditions of Theorem 2 are fulfilled we make the transformation 
(6) in (7) with t = U and t = — to. Then 

I J m {t) | g Cm^Pin = m \ G(U)) + P(n = m\ G(-t o))] g 2 Cm* < «. 

This justifies the differentiation of the series in (3). 

Substituting N for m and n > N forn = m in the above expression we further 
have 

I Jn® I ^ GN^Pin > N | <?(*,)) + Pin > N \ G(-t ,))], 

and conclude from Lemma 2 with k = y in (1) that Jn(i) tends to zero uniformly 
in t. It follows that the rth derivative of Ry also tends to zero uniformly in t. 

In the second part of the proof we assume the conditions of Theorem 1 to be 
satisfied. We then write (7) in the following form 

(8) | J m (t) | £<?(*!••• = m)E^ m (e‘ oZn + 

where E n=m signifies the conditional expectation when it is known that n = m. 
From the definition of n it follows that, when n = m, we have l m ~i < Z m -\ < 
a m - 1 and Z m ^ a m or g b m . Hence 

E K . m (e hZn ) g E n . m {e 1 ^ | Z m ^ a m ) = E n . m [e to(Zm - l+ ^ | Z m _i + s m ^ a m ] 

S e tottm -'E[e to ‘ m | z m > a m - b m ^} < *. 

The second exponential can be treated in a similar way. Thus J m (t) is majorated 
by a finite expression 

Finally, we substitute N for m and n > N for n = m in (8). I being a closed 
interval it follows from condition 3° in Theorem 1 that we can find a constant 
C such that 

| Jy(t) | ^ CN“P(n > N)E n>N (e hZN + e~ hZli ). 

From the definition of n and condition 2° in Theorem 1 we have b < Zy < a. 
An application of Lemma 2 then shows that Jy(t) tends to zero uniformly in t. 
This proves the theorem. 

Corollary to Theorem 3. When the conditions stated in Corollary of Theorem 2 
are fulfilled for all t m the closed interval I, Theorem 3 is true. 

This is obvious. 

6. The fundamental identity for identically distributed variables. In the 
special case of identically distributed variables for which P(z = 0) < 1 and 
0 < <p(t) < oo we infer from Theorem 1 that the fundamental identity 

(9) E[e lz *(<p(t)r n ] = 1 

holds if t is complex and | (pit) \ ^ 1. This is the case discussed in [1]. 

Further, when P{z = 0) < 1, the integrals f e tz dF and f e tz dF cannot both 



444 


BROCKWAY MCMILLAN 


be zero for every a > 0 and /3 < 0, and thus we infer from Theorem 2 that the 
fundamental identity holds for all real t (if the limits ov and bn are chosen in 
accordance with the conditions of this theorem). This proposition is somewhat 
more general than that proved in [3] by a similar method. 

It also follows from the last remark and Theorem 3 that, when P(z =■ 0) < 1, 
(9) can be differentiated any number of times for any real t. This proposition 
contains the results in [2] and [3| as special cases, 

7. A generalization. We finally remark that the assumption made in Theorem 
3 that the expressions containing derivatives of are uniformly bounded iB 
unnecessarily restrictive. For example, it seems possible to prove that the first 
derivative of (2) may be obtained by differentiation under the expectation 
sign if the series (cf. Corollary 1 to Theorem 7.4. in [6]) 

m-1 »-l 

is uniformly convergent with respect to t. 

REFERENCES 

[11 A. Wald, “On cumulative sums of random variables,' 1 Annals of Math. Stat ., Vol. 15 
(1944), p. 286. 

[2] A. Wald, “Differentiation under the expectation sign in the fundamental identity of 

sequential analysis,’’ Annals of Math. Stat., Vol. 17 (1940), pp. 493-497. 

[3] G. E. Albert, "A note on the fundamental identity of sequential analysis,’’ Annals of 

Math. Stat., Vol. 18 (ig47), pp. 593-596 and Vol, 19 (1948), pp. 420-427, 

[4] C. Stein, “A note on cumulative sums,” Annals of Math. Stat., Vol. 17 (1940), pp. 498- 

409. 

[6] H. CramAb, Sur un nouveau theorime-Iimite de la th<5orio des probability, Actualilts, 
scientifiquet et industrielles , no, 730, Hermann et Cio., 1938, p. 5. 

[0] J. WoM r owiT*,.“The efficiency of sequential estimates and Wald’s oquation for se¬ 
quential processes,’’ Annals of Math. Slot., Vol. 18 (1947), pp. 228-229. 


SPREAD OF MINIMA OF LARGE SAMPLES 

By Brockway McMillan 
Bell Telephone Laboratories, Murray Hitt, H. J . 

1. Theorems. Let % have the continuous cumulative distribution function 
F(x). Let (xi, • • • , xn) be a Bample of N independent values of x and y *■ 
inf (®i ,•••,*#). Then y is a random variable with the cumulative distribution 
function 

(1) QM = 1 - (1 - F(y))\ 

Let K values of the new variable y be drawn, (yi, ••• , Vs) and let the spread 
w = sup (yi, • ■ ■ , y K ) - inf (j/i , • • • , y K ). 



SPREAD OF MINIMA 


445 


Fixing K, we consider the cumulative distribution function of w, Pn(w), as 
N —» ao. That is, we have K large samples of x and wish to examine the spread 
among their minima. It is evident intuitively that if F(x) = 0 for some finite x, 
these minima are bounded from below and will cluster near the vanishing point 
of F(x), making w —» 0 statistically as N —» <®. Our theorems also show that 
even when y —> — » statistically, i.e., when F(x) - 0 for no finite x, the spread 
w —> 0 statistically if the tail of F(x) is sufficiently small (e.g, Gaussian). On 
the other hand, if F(x) = 0(e kz ) as x —> — «, the distribution Pn(w) does not 
peak as N —> <», while for larger tails (e.g. algebraic) w> —► + » statistically 
Two simple theorems are 

I. If 

F[x) t 

lim =7-p—r “ 1, 

X—-X F(x + s) 

then 

lim P*(«) = 0. 

1 

II. Let a > 0. If 

F(x o) ** 0 for some Xo > — 00 , or if 


then 


lim 


F(x) 


-« F(x + 8 ) 


0 , 


lim Pif(s) = 1. 

JV-*co 


Theorem I is directly applicable to distributions with algebraic tails, theorem II 
to Gaussian tails. We prove them both as corollaries of the more general results: 
III. If 


then 

IY. Let s > 0. If 

then 


lim inf \ — l 

Fix + s) 


lim sup P„(a) < (1 — l) 1 


Fix) = 0 for no finite x and 

Fix ) 


lim sup 


Fix + s) 


L, 


lim inf P*(a) > [e~ aL - e~ a ] K 


for any a > 0. 



446 


BROCKWAY McMILIiAN 


Theorems III and IV together show that an exponential tail (F(x) = 0(e k1 )) 
leads to a Pn{w) which, asymptotically, is bounded away from 0 for any w > 0 
and bounded away from 1 for w sufficiently small. 

2. Proofs. Explicitly, for any s > 0, 

( 2 ) P,(«) = K [G n {x + a) - Q N (x)]*~ A dG N {x + a). 


Turning now to III: given s > 0, choose = £i(<) so that (i) F(x i) ^ 0 , and 
(ii), x < Xi implies 


( 3 ) 


m 

F(x + s) 


> l - 


e. 


We then rewrite ( 2 ) as 


( 4 ) 




Treating Gm(x + s) K as the independent variable, the first integral may be 
evaluated by the mean value theorem in the form 


( 5 ) 



G„&) T " 1 f“ 

Qh(xi + sj. 


dGsix + s) K 


< 


Q»{x„) T~ x 
On(Xn + s)_ 


with an appropriate Xt — Xt(N), — «> < xt < Xi. 

Using the form (2) of the integrand in the second term of (4), we may bound 
the latter by 

( 6 ) K [ dO y (x + s) < #[1 - G n (x, + a)], 

•'ll 

since 


G„(x + s) - G„{x) < 1. 

Now, by factoring (1), 

m Gs(x) _ F(x ) 1 + Q + • • • + Q N ~' Fix ) 

G„(x + a) Fix + a) 1 + Q, + • • • + Q N .~ l ~ Fix + s) 

where Q = 1 — F(x), Q, = 1 — F(x + s) < Q. Combining (3), (4), (5), ( 6 ), 

and (7), 


P*(») < [1 - l + + K[ 1 - (?«(«! + «)]. 


Since Fix .i + s) > Efzi) > 0, we have 

lim Gs{xi + s) = 1 . 

N —*°0 


lim sup P K (s) < [1 — l + e]* 1 

JV-t 06 


Hence, 



SPREAD OF MINIMA 


447 


and III follows by letting e —» 0. Then I follows immediately with 1 = 1, when 
we note that Pjy(s) > 0. 

To prove IV, choose any a > 0. By hypothesis, for sufficiently large N we 
may always find x N = x N (a) such that 

(8) F{xm)=Y‘ 


By hypothesis, and the monotonicity of F( x), x^ —* — a> as N —* °°. For any 
e > 0, therefore, we can find No = No(a, e) such that N > No implies 


( 9 ) 


F{x n ) L 

F(xn + s) ~ 1 — e 


CL 

or F(x# + 5 ) > ^(1 — c). Directly from (2), since s > 0, 

Ph(») > K / [Gx(x + s) - G n {x)] x ~ 1 dG„{x + s) 

[Gn(x + s) — Gn(xn)] k dGx(x-{- s). 

Jxh-s 

But this last integral is of the form 

J K(U ~ Gf~ l dU = (TJ - G ) k , 

whence 

Fn{s) > [Gtf(xx + s) — Gx(xx)] k , 
or 

(10) Px(s) > [(1 - F(x„)f - (1 - F(xx + s))T. 

By (8) and (9), therefore 

Since this holds for all N > Vo (a, «), 

lim inf P w (s) > [e -at — e - “ (1-e, ] K 

JV-*« 

This last, in turn, now holds for any e > 0, hence 

lim inf P*(i) > [e~ aL — e - ”]*. 

Nf—*eo 

This now holds for any a > 0 Maximizing on a yields a sharper bound than the 
result of IV. The applicable part of II follows, when L = 0, by lettmg a —> <*>. 
That the conclusion of II holds when P(x 0 ) = 0 for some finite x a follows from 
(10) with Xi v replaced by some Xi such that F(x i) = 0, F(xi + s) > 0. 



448 


EDGAR REICH 


OK THE CONVERGENCE OF THE CLASSICAL ITERATIVE METHOD 
OF SOLVING LINEAR SIMULTANEOUS EQUATIONS 1 

By Edgar Reich 
Massachusetts Institute of Technology 

The classical iterative method, or Seidel method, is a scheme for solving the 
system of linear algebraic equations 

f» 

AjjXj hi y (i 1 , 2, ' ' ' , 7l)y 

/“I 

by successive approximation, as follows: 

If = (xi°, xi' 1 , , Xb’) is the vth approximation of the solution, the 

(v + l)st approximation, x ( ' +1) = (x 1 < ’ +1) , Xj t ' +1) , ■ ■ • , x l n ,+>) ), is obtained from 
the relations 

fAnXi' +l) 4- Ai»Xi <0 4" AijXj’ 5 4- •• * + AinX^' =» hi, 

AjiXi’ +1) 4- AaXi +1) + AmXj' 5 4- ■ - • 4- AjnX^ => hi , 

■ AuXi’ +l> 4“ AiaXj' H1> 4" Ajaxi’ +1) 4- • • • + ■4a n xi , ’ ) *= 6a , 


Umx i' +,) 4- A ni xi' +l) 4- A n ri' +l) + ... 4- A n „xL- +1> - b„, 

Xi ,+1) being obtained from the first equation, then xa' +1) from the second, and 
so on. 

The given system can be written in matrix notation as Ax = b where A is 
a non-singular square matrix of order n, and x and 6 are column vectors of order n. 
Let us define square matrices Ai and A 2 as follows: 


( Ai)i/ 


An if i > j 
0 if i < j 


(Ai)ij 


An if i < j 
.0 if i > j 


(Note that A\ 4- A t - A.). 

With this notation the Seidel method can be written as the matric difference 
equation 

Aix ( ’ +1> 4- Ajx w = 6. 

Now various writers, among them C. E. Berry in this journal, (See list of refer- 


1 Work done under Offioe of Naval Research Contract NSoriCO. 



CONVERGENCE OF ITERATIVE METHOD 


449 


enees at end of this paper.) have shown that a necessary and sufficient condition 
for convergence, i.e., a necessary and sufficient condition for 

lim (x, M — xf) = 0, (i = 1,2, •••,%), 

jj— 

is that 

(1) A has an inverse; that is A„ p* 0 for any i. 

(2) The characteristic roots of (A^Af) all have an absolute value smaller 
than unity. 

It would be advantageous to rephrase the above condition, if possible, in terms 
of simpler requirements on A. As a step in this direction the following theorem 
is offered: 

Theorem. If A is a real, symmetric nth-order matrix with all terms on its main 
diagonal positive, then a necessary and sufficient condition for all the n characteristic 
roots of (A^Af) to he smaller than unity in magnitude is that A is positive 
definite. 

Proof. Let z 3 be a characteristic vector of (Af 1 Af) corresponding to the 
characteristic root mj . Then 

(1) (47%) •, = V-fii . 

Premultiplying by 2,'Ai, where the apostrophe and bar denote transposition 
and conjugation respectively: 

(2) z t AiZ] — p 3 z t AiZj . 

Consider the bilinear form z[Az } . 

We have 

(3) z'iAz : = z[AiZj + z[AiZj - (1 + mj) z,A\z } . 

Interchanging i and j: 

(4) zjAz, = (1 4" mO^j-AiZ, . 

Taking the conjugate: 

( 5 ) z'jAz, = z[Az 3 = (1 + ii^z'jAiz, = (1 + p.,)z[A[zj. 

Let D be the diagonal matrix with elements 

(6) = Ai 3 di ]. 

This makes A[ = D + A 2 . 

Substituting this in (5): 

(7) z[Az 3 = (1 + fif) ( z[Dz 3 + ZiAtfj) = (1 + M. )%\Dz, + (1 + m>)mjZ»Aiz, . 
Eliminating ZiAiZj between relations (3) and (7) we obtain 

(8) (1 — jiiMj )z'iAzj = (1 + Mi) (1 + nj)z,Dz,. 



460 


EDGAR RKICII 


To obtain the necessary condition we use the fact that we must have | m | < i ( 
and can therefore rewrite (8) as 

( 9 ) z'iAz, - l[T)Zj => 2D (1 + + n,)n)l'iDzj, 

1 — hiMj *~o 

m 

If x = X) is any linear combination of the m <n independent characteristic 

i-i 

vectors of (AI^Aj) then 

m \ in 

2D 0^ Zi j = 2 .j (h Cj 5 j /I Zj 

<-1 / 1.1-1 

— 2D 5,'Cj 2 (1 + Mi) au(1 + Hj)n)z',Dzj, 

i,/-l 1.-0 

or 

0O 

= 2D 24-Ol/i 

(■-a 

where 

m 

Vk = 2D tvd + ■ 

t-i 

Since by hypothesis A„ > 0, D is evidently positive definite, and therefore 

(11) S'Ax > 0. 

In case the characteristic roots m, (i = 1,2, ■ n), are all distinct there will be n 

independents assured, and in that case (11) implies that A i9 positive definite 
Consider, on the other hand, the case where the m> are not all distinct. Note 
that (a) the definiteness properties of a matrix are not changed by sufficiently 
small alterations in the elements; (b) the m’s depend continuously on the elements 
of A; (c) the discriminant of (1) is a polynomial in the A,/ that does not vanish 
identically. 2 It follows that A must be positive definite even in the case of re¬ 
pealed roots because an arbitrarily small change in A 'will separate any multiple 
m’s, still keeping them smaller than unity in magnitude, and not changing the 
definiteness properties of A. 

This completes the proof that the condition given in tho statement of the 
theorem is necessary, Now to prove sufficiency: 

Setting i = j in relation (8) we obtain 

(12) (1 — | I *)i'<Az< — | 1 + M( I %Dz< 

Since both A and D are positive definite 

(13) z'iAzi > 0 and i'iDz { > 0. 

1 The fact that the discriminant is not identically zero follows from easily constriictible 
counter-examples. 




RECURRENCE FORMULAE 


451 


Moreover, we cannot have p, = — 1 because that would mean by (3) that 
0 = z'iAiz , + z'tAaz, = 2,'Az ,. 

Relation (12) thus implies 

(14) 1 - | m, | 2 > 0 

i.e. | m | < 1 as was to be proved. 

The part of the theorem giving the sufficient condition was already obtained 
by L. Seidel [1] and G. Temple in a somewhat more indirect fashion. 

REFERENCES 

[1] L, Seidel, ‘'fiber ein Verfahren die Gleichungen, auf welche die Methode der kleinsten 

Quadrate fuhit, sowie lineare Gleichungen uberhaupfc, durch successive Anna 
herung aufzulosen,” Abhandlungen der Malhematisch-Physikalischen Classe der 
Kiniglich Bayerischen Akademie der Wissensnhaften, Vol, 11 (1874), pp 81-108. 

[2] C E Beehy, “A criterion of convergence for the classical iterative method of solving 

linear simultaneous equations,” Annals of Math. Stat , Vol 16 (1945), pp 398-400, 

[3] L Cbsari, "Sulla nsoluzione dei sistemi di cquaziom lineari per approssimazioni suc¬ 

cessive,” Rassegna delle Paste , dei Telegrafi e dei Telejoni, Anno 9 (1931) 

[4] L. Cesabi, "Sulla risoluzione dei sistemi di equazioni lineari per approssimazioni suc¬ 

cessive,” Reale Accademia Nazionale dei Lmcei, Serie 6, Classe di Szienze fisiche, 
matematiche e naturali, Rendicontl, Vol 25 (1937), pp. 422-428 

[5] J Mokkis, The Escalator Method m Engineering Vibration Problems, Chapman and Hall 

Ltd., London 1947, pp. 63-70 

[6] R J Schmidt, “On the numerical solution of linear simultaneous equations by an 

iterative method,” Plnl. Mag , Ser 7, Vol. 32 (1941), pp. 369-883. 


SOME RECURRENCE FORMULAE IN THE INCOMPLETE BETA 
FUNCTION RATIO 

By T. A. Bancroft 
Alabama Polytechnic Institute 

1. Introduction. It is well known that the incomplete beta function ratio, 


defined by 


(1) 

r B *(P, S) 

UV ’ 5) B(p,,) ’ 

where 


(2) 

BAv, q) = f - z) 1 

Jo 

and 


(3) 

B(p, q) = Bi(p, 9), 



452 


T. A. BANVItOFT 


is of importance in probability distribution theory, and, hence, also in obtaining 
exact probability values in making tests of statistical hypotheses. In constructing 
certain extensions [1] of Ivarl Pearson’s “Tables of the Incomplete Beta-Func¬ 
tion” [2], the recurrence formulae contained in the following sections were de¬ 
rived. 

2. Derivation of formulae. The incomplete beta function, B*(p, q) may be 
considered as a special case of the hypergeometric series, F(a, b, c, x), thus 

(4) BAp, q) = ~ F(p, 1 - q, p + 1, .r). 

V 

The series converges for | x | < 1, if and only if a -f b < c. By setting a = p, 
b = 1 — q, and c =* p + 1, aa in (4), all conditions are satisfied, if we also take 
q > 0. 

Recurrence formulae for F(a, b, c, x), e. g., in the work of Magnus and Obor- 
hettingcr [3], may now be directly converted for use with B x (p, q) or I*(p, g). 
In particular, using the three identities on page 0 of [3J, with :r replacing z, we 
have 

(5) cF(a, b, c, x) + [b — c)F(a + 1, b, c + 1, x) 

— b(l — x)F(a -f- 1, b + 1, c 4 1,») =* 0, 
(0) c(c — ax — b)F(a, b, c, x) — c(c — b)F(a, b — 1, c, x ) 

+ abx(l — x)F(a 4- 1, b + 1, c + 1, x) => 0, 

(7) cF{a, b, c, a;) — cF(o, b 4 1, c, e) + axF(a 4 1, b + 1, c + 1, x) = 0, 

with a = p, b = 1 — g, and c = p + 1, we obtain in turn 

(8) xIAp, q ) - L(p + 1, q) + (1 - x)h(p + 1, g - 1) = 0 

(9) C P + q ~ px)l x {p, q) - qlx(p, q 4 1) - p(l - x)I r {p 4 I, q - 1) = 0 

(10) qlAp, g 4 1) 4 plx{p + 1 ,q) - {p + q)I*(p, q) = 0. 

Formula (8) is the basic recurrence formula used in the construction of Karl 
Pearson’s [2] tables. Formula (10) was obtained, incidentally, by the author [4| 
in a different connection and manner. 

Formulae (8), (9), and (10) may now be combined to give other useful formulae, 
e. g., 

(11) ql*(p 4 1, q 4 1) + (p 4 qx - g)I*(p + 1, g) - (p + q)xl,{y, q) = 0, 

(12) pi x (p 4 1, g 4 1) 4 (g — p 4 gs)/x(p, g 4 1) 

- (p 4- g)(l - x)l x (p, g) = 0, 



RECURRENCE FORMULAE 


453 


(13) (p + q — 1)jcI x (j> - 1, q ) 


- (p + q - lx + p)7,(p, q) + p7*(p + 1, q) = 0, 

(14) (p + gr)(l - 4)7*(p + 1, 9 - 1) 

- {(P + g)(l - *) + ?]7,(p 4 1, q) + p/x(p + 1, q + 1) = 0. 

Notice that the sum of the coefficients is always zero 

By a repeated use of (10) it is possible to obtain the formulae 

1 


(15) 


(16) 


7x(p, q + n) = 


i.(p + «.»)- (V4 , ~ 1)M g (-c 

■ ([) (P + 3 n ~ 1 ) (n-fl {? + r - l) M I x (p, g 4- r), 

1 V f 

(q + n- 1 )« 1 

■(“)(? + ? + »- l) (n ~ r, (p + r - l) (ri 7,(p + r, q), 

where (p + q + n — l) (n-r) , etc., refer to the factorial notation, e. g., 

[p + Q + (n — l)] Cn r) = (p 4 q + w ~ l)(p + j + « -2) ■" (? + ? 4-r). 

3. An application. Formulae (15) and (16) may be used to write general 
formulae for obtaining values of 7,(p, q) where p or q may be greater than 50, 
i. e., for such values outside the range of Karl Pearson’s tables. In particular, 

r 


Jx(50 + n, q) = 49 + n)M [n+q + 49) (n) J,(50, q) 


(17) 


and 


(18) 


-(l)^4-? + 49) c "- 1, /x(50, 9 + l) ••• ( —l) n (g + »- l) <n> 7,(50, q + n)^j 

7x(p, 50 + n = (49 * n)M [<» ■+ P + 49) Cli, 7*(p, 50) 

-(i)p(n4- P + 49) < "~ 1) 7s(p4-1,50) ■ • ■ (-l) n (p + n- l) (n) 7,(p + n,50) j 

It should be noted for (17) that as n increases the range of values that can be 
obtained outside Karl Pearson’s tables are reduced since the last term of (17) 
contains 7,(50, q + n). A similar observation is noted for (18). From a practical 
standpoint the computational labor restricts n to fairly small values. Using (17) 
we may easily compute for example, 

7 6 o( 52, 48) = 7 eo(50 4 2, 48) 


= - * [(99) (98)7 eo(50, 48) - 2(99) (48)7 „(50, 49) 4 (49) (48)7 eo(50, 50)]. 

(51)(50) 



454 


T. A. BANCROFT 


Substituting the necessary values from Karl Pearson’B tables we calculate 

/.so (52, 48) - .9405248. 

Similarly using (18) we may calculate 

7.40(48, 52) - .0534752. 

As a check on the computations, we use the well-known identity 

I*(P> ?)==!- 7i-*(p', s'), 
where p' = q and q' = p. Then 

7.4o(48, 52) = 1 - 7. m (52, 48) 

= 1 - .9465248 


= .0534752. 


In like manner formulae (15) and (16) may be used to write general formulae 
for obtaining half values for p or q greater than 10.5, i. o., for values not in¬ 
cluded in Karl Pearson’s tables. In particular, 


'19) 


J.U0.5 + n, q) - [(9.5 + s + »)’ [n) 7,(10.5, q) - (fy 

•g(9,5 + s + m) ( ’*“' I) 7,(10.5,ff + 1) ••• (-l) n (ff + n - l) w 7,(10.5,s + n)], 


and 


( 20 ) 


Up, 10.5 + ft) - [(9.5 + p + n) M I x (p, 10.5) - Q 

• p(9.5 + p + n) <B-1) J»(p +1,10.5) • • • (-1 ) n (p -j- n - l) M I t (p + n, 10.5)J. 


Using (19) we may compute 


7.60(12.5,8) = [(19.5) (s> 7.,o(10.5,8)-2(8) (19.5)1.60(10.5,9) 

+ (9) (8)7.60(10.5,10)], 

Similarly using (20) we obtain 


.4512367. 


7.io(8, 12.5) - .5487633. 
Employing the check formula, 

7.4o(8, 12.5) = 1 - 7.60(12.5, 8) 


= 1 - .4512367 
= .5487633. 



A THEOREM BY WALD AND WOLFOWITZ 


455 


Thanks are due to Dr. J. C. P. Miller, Technical Director, Scientific Com¬ 
puting Service, Limited, London, England, for helpful suggestions m the prepara¬ 
tion of this paper. 

REFERENCES 

[1] T, A, Bancroft, “Some extensions of the incomplete beta function tables ” (in prepara¬ 

tion) 

[2] Karl Pearson, Tables of the Incomplete Beta-Function, Cambridge University Press, 

1934. 

[3] Wilhelm Magnus vnd Fritz Oberhettinger, Formeln and Satze fur die Speziellen 

Funkhonen der Mathematischen Physik, Julius Springer, Berlin, 1943. 

[4J T A Bancroft, “On biases in estimation clue to the use of preliminary testa of signifi¬ 
cance, Annals of Math. Slat Vol 15 (1944). 


ON A THEOREM BY WALD AND WOLFOWITZ 


By Gottfried E. Noether 
New York University 

Let § n = (hi , ' ■ ■ , ft„), (n = 1, 2, • • ■), be sequences of real numbers and for 
all n denote by H h the symmetrical function generated by hi 1 • • • ICT, 
i e , H tl = 2 K\ ■ • • ft?” where the summation is extended over the n(n — 1) 
• • ’ (n — m + 1) possible arrangements of the m integers h , • • , i m , such that 
1 < ij < n and i, ^ t* , (j, k = 1, - ■ ■ , m). According to Wald and Wolfowitz 
[1] the sequences 53 „ are said to satisfy condition W, if for all integral r > 2 


- Z (ft, - ny 

n ,-i 



= 0 ( 1 ), 1 


where Ji = 1/n 2 ,ii hi. 

Given sequences 9L, = (oi , • • • , a„) and = (di , ■ • • , d„), consider the 
chance variable 


L n = diXi + • • • + d n x„ , 

where the domain of (xi, • • • , x n ) consists of the n ! equally likely permutations 
of the elements of §1„ . Then it is shown in [1] that if the sequences s !l„ and 3)„ 
satisfy condition W, the distribution of L° n = (L„ — EL n )/<r(L n ) approaches the 
normal distribution with mean 0 and variance 1 asm <». These conditions 


1 The symbol 0, as well as the symbols o and be used later, have their usual meaning. 

See e g Cram4r (2, p. 122] 



456 


fiOTTKJUhD K. NOETHKK 


for asymptotic normality can lie weakened. It will be shown that the following 
theorem holds: 

Theorem. L° n is asymptotically normal with mean 0 and variance 1 provided ihe 
sequences 35* satisfy condition IV while for the sequences 21* 


(1) 


£ (a, - aY 

,«a] 



- 0 ( 1 ), 


(r = 3, 4, • ■ •). 


We note that L\ i— 1 . ’T-irf' 1 ;f \ : s replaced by [l/ziS (a, — d) 2 ]" 1/5 (a. - &) 
and d, by [l/n 2 . — d). Therefore it is sufficient to prove 

asymptotic normality provided 


(2) Di - 0, Da = n, D, « 0(n), (r = 3,4, 

(3) Ai « 0, -4j = n , A, <* o(n r>1 ), (r — 3,4, • ■ ■)• 

Then 

EL„ *= DiExi — 0, 
var L„ = Ell « D,Ex] + DuBxat 

“ + *(S 1 -0 W! - AMD', - D.) ~ n, 

and it is sufficient to show that n~ rli EL r n tends to the rth moment of a normal 
distribution with mean 0 and variance 1. 

Now we can write 


a, = n^Eiy = n rn £ • • • £ Bd h x, t • ■ • d>,x it 

ii-i i,-i 

(4) » n" 3 1 D t Ex', + • • ■ + c(r, e,, • • • , e m )D tl . Ex J 1 • • • 

+ ‘• ‘ + Di...iExi • • • x r ] 


where ei + ■ ■ • + e„, = r with ed, (It •= 1 , • • ■ , m), positive integral and the 
coefficient c(r, c, , • • ■ , e m ) stands for the number of ways in which the r indices 
ii, ,i, can be tied in m groups of size cj, • • ■ , e ln , respectively, so as to 
produce the terms of D, s .., t „EzV • • • ,-r,T. 

Since ExV • ■ ■ x,‘d ~ n~ n A„ r .. tm we have 


(5) n~ r/2 B„ . ExV ■ • ■ *S» ~ n (rl3+m) D tl . 

Lemma. B(r, Ci, • ■ • , e m ) ~ 0 unless 






B(r, e, , ■ • • , e m ), say. 


(6) m = r/2, ei = • • • = e r/2 = 2. 

In that case B(r, 2, ■ • , 2) ~ 1. 

Before proving this lemma we shall show that our theorem follows immedi¬ 
ately By (4) is the sum of a finite number of expressions B(r, e i, • ■ • , 



A THEOREM BY WALD AND WOLFOWITZ 


457 


Therefore if r = 2s + 1, (s = 1, 2, ■ •), n 2s+ i ~ 0, since at least one of the e k , 
(Jc = 1, ■ • , to), in all the B(2s -(- 1, e \, • • , e m ) adding up to ii^+i must be odd. 
If r = 2s, ~ c(2s, 2, • ■ , 2). Since the first index m (4) can be tied with any¬ 

one of the other 2s — 1 indices, the next free index with any one of the remaining 
2s — 3 indices, etc., it is seen that ~ (2s — l)(2s — 3) • 3. However these 
are the moments of a normal distribution with mean 0 and variance 1 This 
proves the theorem. 

Proof of Lemma. Define A (jy, •• ,jh) = A n ••• A ]h Then A ei , m is the 
sum of a finite number of expressions A(ji, • • , j;,), where the j g , (g = 1, • • ■ , h), 
are obtained from ey , • , e m by addition in such a way that 

(7) ji+ + Jh = e i + + e m = r 

Since by (3) Ay = 0, we need only consider those A(j i, • ,jh) for which 
jo > 2, (g = 1, • • , h) If some j„ > 2 by (3) and (7) 

(8) A(ji , • • • ,j h ) = o(n rl -). 

If jo ^ 2, 

(9) A(2, ■ • • , 2) = Al 12 = n r, \ 

This last case can only happen if r is even and e* , (k = 1, • • • , m), equals either 
1 or 2. Therefore, unless (6) is true 

(10) m > r/2. 

Similarly, writing D ei ., tm as a sum of products of the kind D n ■ D h it is 
seen that by (2) 


( 11 ) 


10(n m ) if to < r/2 

\o(n r ' 2 ) if m > r/2. 


Thus by (S)-(ll) 

(12) A ei . em D ei . = o(n r/z+m ), 


unless (6) is true. In that case 

(13) j ~ A2 /2 = n 11 , 

(14) D 2 2 ~ D\ n = n rl ~. 


(12)—(14) together with (5) prove the lemma 

Let ai, a 2 , • • • be independent observations on the same chance variable Y. 
We may ask what conditions have to be imposed on the distribution of Y to 
insure—at least with probability 1—-that condition (1) is satisfied. Wald and 
Wolfowitz state in Corollary 2 of [1] that provided Y has positive variance and 
finite moments of all orders the m , a 2 , satisfy condition W wdh probability 
1 and therefore insure asymptotic normality of L n provided the sequences S) n 
satisfy condition W. On the other hand, it can be shown that the ay , Oj, • • 



458 


Z. W. BIIlNHAUM AND F. C. ANDREWS 


satisfy condition (1) with probability 1, provided Y has positive variance aad 
a finite absolute moment of order 3. Thus condition (1) constitutes a considerable 
improvement over condition W. 


REFERENCES 

[1] A. Wald and J. Wolfowitz, “Statistical tests based on permutations of the observa¬ 

tions," Annals of Math. Stat ,, Vol. 16 (1044), pp. 358-372. 

[2] H. Cram^h, Mathematical Methods of Statistics, Prinoeton, 1946. 


ON SUMS OF SYMMETRICALLY TRUNCATED NORMAL RANDOM 

VARIABLES 


By Z. W. Biknhaum and F. C, Andrews 1 
University of Washington, Seattle 


( 1 . 1 ) 


/.(*) = 


1. Introduction. Let X a be the random variable with the probability density 

<7e _I,/s for | x \ < a 
0 for ] x | > a, 

obtained from the normal probability density — 7 == e ~ I>/2 by symmetrical trunca¬ 
tion at the "terminus” \ x\ — a, and let be the sum of m independent sample- 
values of X a . Wo consider the following problem: An integer m > 2 and the real 
numbers A > 0, t > 0 are given; how does one have to choose the terminus o 
so that the probability of | Sa m) | > A is equal to e, 

(1.2) P(| S™ | > A) = e? 

This problem arises for example when single components of a product are 
manufactured under statistical quality control, so that each component has the 

length Z = k + X where X has the probability density ^ 7 == e~ x ' 12 , and the final 

product consists of m components so that its total length S is the sum of the 
lengths of the components. We wish to have probability 1 — t that S differs 
from mis by not more than a given A, To achieve this we decide to reject each 
singlo component for which \ Z — k \ = \ X \ > a; how do we determine a ? 

The exact solution of this problem would require laborious computations . 2 
In the present paper methods are given for obtaining approximate values of a 
which are “safe”, that is such that 


(1.3) 


P(| S^ | > A) < e. 


1 Research done under the sponsorship of tho Office of Naval Research 

2 A similar problem has been studied by V J. Francis [2] for one-sided truncation, he 
actually had the exact piobabilities for the solution of his problem computed and tabulated 
for m = 2, 4. 



TRUNCATED RANDOM VARIABLES 


459 


In deriving these safe values, use will be made of theorems on random variables 
with comparable peakedness, for which the reader is referred to a previous 
paper [ 1 ], 


2. The safe value ai. For fixed a > 0, we consider the normal random variable 
Y a with expectation 0 and with probability density g a {Y a ) such that g a (0) = /„(0). 
It is easily seen that Y a has the standard deviation 


( 2 . 1 ) 


1 

ffa ~ L 


-i */2 


dt, 


and that g a (£) < /„(£) for | £ ] < a, g a {£) > 0 = /„(£) for | £ | > a. Hence, applying 
Theorem 1 in [1], we conclude that 


( 2 . 2 ) 




-tt/2 




dt. 


If m, A , and c are given, we determine from tables of the normal probability 


integral so that 
equation 

(2.3) 


2 _ r 

\/2r if, 


-(«/ 2 


dt = e, set <r a = 




in ( 2 . 1 ), and solve the 


A 

hVm 


1 f 4 * 

■\/2ir J~a 


*+<i 

f e~‘ ,,2 dl 


for a using again tables of the normal probability integral. In view of (2.2) this 
solution satisfies (1.3) and hence is safe; it will be denoted by . 


3. The safe value. a 2 
inequality 

P( | Si m) | > A) 

(3.1) 

< 


A direct application of Theorem 2 in [1] yields the 


1 


2 m *m! j(fi+^/o)<j£n 




for 0 < A < ma. Hence by equating h m (A/a) to e and solving for a, we obtain a 
safe value which will be denoted by a %. It is of interest to note that (3.1) is true 
not only for f a (x) defined by ( 1 . 1 ) i.e. truncated normal, but /or any probability 
density f a (x) which is symmetrical and unimodal, since these are the only assump¬ 
tions needed for Theorem 2 in [ 1 ], 


4. Solution for large m. The random variable X a has the variance 


(4.1) 


<f\X a ) 


2 <t>"(a) 
2<p(a) — 1 


where 


^ 1,12 dt - 



460 


Z. W. BIHN]1AUM AND F. C, ANDREWS 


Hence, according to the central limit theorem, we have the approximate equality 


(4.2) 




-<»/2 


(x/»(JT „)%/«>) 


dl 


for m sufficiently large. 

It can he reasonably expected that the cumulative distribution of differs 
from its limiting normal probability integral by less than the cumulative distri¬ 
bution of the sum t/a ,r0 of m independent uniform variables in (—a, +a) differs 
from its limiting normal probability integral. Already for m = 4 the cumulative 
distribution of Ui m) differs from the corresponding normal cumulative by less 
than .0075. Equally good or better approximation may, therefore, be expected 
for the distribution of S£”°, so that the error in the approximate equality (4.2) 
between the two-tail probabilities should be less than .015 for in = 4, and still 
less for m > 4, 

Equating the right-hand term of (4,2) to e and solving for <r*(Xa), we obtain 


o(-Y a ) = 1 + 


2 4>"{a) 

2</>(a) - 1 



an equation which can he solved for a with the aid of tables of <p(x) and 
We denote this value of a by ai. 


5. Use of the different solutions in practice. From the foregoing it appears 
that the following procedure may be followed iu solving our problem in any 
definite case: 

If m is large, m is very close to the exact solution of (1.3) and may be used 
safely. 

If m is not large but m > 5, it is conjectured that m is such that the left-hand 
term in (1.3), for a — ai, differs from t by less than 0.015. 

If m < 4, the larger of and ai should be used. Table I contains the A for 
which aj and a 2 have the same value, say a'; ai or a 2 should be used if the given A 
is greater or smaller, respectively, than the tabulated value. The value m is 
easily computed from a table of the normal probability integral by the procedure 
of section 2 The value <h can be obtained by reading off A/a 2 from Table II. 


TABLE I TABLE II 


Values 

i of A for 

■ which 

a,\ « ai 

= o' for given m 

i « 

Values of A/a j 

for given m, e 

\ 

\ m 

<\ 

\ 

2 

A 

a‘ ^ 

3 

A 

a' 

■ 

a' 

XT 

■\ 

2 

3 

t 


4.568 

2 357 

5.446 

2.008 

6,152 

1 842 

,001 

1.937 

2,712 

3 339 

ins 

4.258 

2 228 

5 059 

1.918 

5.717 

1.779 

.002 

1 911 

2.637 

3.213 

99 

3 808 

2.047 

4.512 

1.799 

6,111 

1.697 

,005 

1,859 

2 507 

3 011 

01 

3.438 

1 910 

4.074 

1.712 

4.632 

1.640 

.01 

1.800 

2.379 

2.824 

.02 

3.034 

1.765 

3 614 

1.630 

4.131 

1.589 

.02 

1.718 

2.217 

2.600 


2 456 

1.581 

2.970 

1.533 

3.425 

1 529 

.05 

1 553 

1.937 

2.240 





A CUMULATIVE FUNCTION 


461 


6. Examples. 1) A = 3.8, m = 4, « = .05 Since A is greater than the value 
3.425 in Table I, we compute ai = 2.162. From Table II we would obtain 
A/ai = 2.240 and thus ai = 1.696 < ai. 2) A = 3, m = 4, e = .02, Since A < 4 131, 
we read A/a 2 = 2.600 from Table II and obtain a 2 = 1.153 which will be greater 
than oi. 3) -4 = 5, m = 30, « = .05. Using the method of section 4 we obtain 
ai = 1 62. 


REFERENCES 

[1] Z W Birnbaum, “On random variables with comparable peakedness,” Annals of 

Math Stai., Vol 19 (1948), pp 76-81 

[2] V J Francis, “On the distribution of the sum of n sample values drawn from a trun¬ 

cated normal population,” Roy Stat.Soc Jour Su-p-pl ,Vol 8 (1946),pp 223-232 


A CERTAIN CUMULATIVE PROBABILITY FUNCTION 

By Sister Mary Aqnes Hatice, O.S.F. 

Si. Francis College, Ft. Wayne, Indiana 

Graduations of empirically observed distributions show that the cumulative 
probability function F(x) = 1 — (1 + x llc )~ llk is a practical tool for fitting a 
smooth curve to observed data. The graduations are comparable with those 
obtained by the Pearson system, Charlier, and others and are accomplished 
with simple calculations. Given distributions are graduated by the method of 
moments Theoretical frequencies are obtained by evaluation of consecutive 
values of F{x) by use of calculating machines and logarithms, and by differencing 
NF(x). No integration nor heavy interpolation is involved, such as may be 
required in graduation by a classical frequency function Burr [ 1 ] constructed 
tables of vi, a, a 3 , and a 4 values for the function F{x) for certain combinations 
of integral values of 1 /c and l//c. In these tables curvilinear interpolation must 
be used in finding an F(x) with desired moments. The writer constructed more 
extensive tables for the same cumulative function with c and k a variety of 
real positive numbers less than or equal to one, such that linear interpolation 
can be used to determine the parameters c and k for an F(x) that has «3 and 
a 4 approximately the same as those of the distribution to be graduated. These 
tables have been deposited with Brown University, Microfilm or photostat copies 
may be obtained upon request to the Brown University Library. 

The writer used the definitions of cumulative moments and the formulas 
for the ordinary moments v \, a, a a , and a 4 in terms of cumulative moments 
as developed by Burr. These latter moments were tabulated for the function F(x ) 
having various combinations of parameters c and k, c ranging from 0.050 to 0 675 
and k from 0 050 to 1.000, each at intervals of 0.025 Within these ranges only 
those combinations of c and k were used which yielded 0:3 of approximately 1 or 
less and an values of 6 or less, since such moments are most common in practice. 

It can be verified that over most of the area of the table <x 3 values obtained 



482 


SISTER MARY AGNES HATKE 


by linear and by curvilinear interpolation on k (or on c) differ by less than 0.001 
and values of a* by approximately 0.01 or less. If a» = constant and a* = constant 
curves are plotted on c, k axes, it will be seen that there exists only one solution 
(c, k ) of the equations a 3 » B(c, k) and a* = C(c, k ). Furthermore, some a< 
curves intersect two a 3 curves representing the same [ a a |. Thus the chance of 
finding an appropriate function F(x ) for graduation is increased since by reversal 
of scale an F(x) with a positive a a may be used to graduate a distribution with a 
negative a a , and conversely, 

Graduation of an observed frequency distribution is easily accomplished. 
Linear interpolation on k for a fixed c seems to be the best method for determining 



Fig. 1. The , S chart for the Pearson system of frequency curves and the area covered 
by /(*) = 1 — (1 + a: 1 '")- 1 '* (subsoript L - bell-sliapod) 

the parameters of an F{x) that has at exactly the same and at nearly the same as 
the observed a a and at, If the observed as and at are fairly close to an entry 
in the table, no interpolation is required. Direct linear interpolation is used to 
determine v : and <r for the c and k just found. Letting M and 8 be the mean and 
standard deviation of the given distribution, the formula, 

x~vi_ t _X — M 
<r " 1 8~ 

is used to translate the class limits X of the given distribution to the correspond¬ 
ing x’s of F(x). For any x that is negative the quantity 1 + x llt is taken as one 







A CUMULATIVE FUNCTION 


463 


to make F( — x) = 0 in accordance with the definition of F(x) [1]. The values of 
(1 + x 1,c ) 1/k for the various x’a are computed by logarithms and differenced to 
obtain the probabilities for the given class intervals, according to equation 


P(a < x & b) = f f{x) dx —F(b) - F(a). 

J a 


The respective theoretical frequencies are these probabilities multiplied by N, 
the number of cases 

The headings that proved satisfactory for the columns of the graduation 
work-sheet are: class intervals (in observed physical units), X (u if unit class- 
interval is used), fob., x, 1 + x lh , iV/(l + a; 1 ^) 1 ' 4 , and f t h. 

The relation of F(x) to the Pearson system of frequency curves is presented in 
Figure 1, which is a reproduction of a major part of Craig’s chart for al and 
$ [2]. Ip this chart the parameters of the twelve Pearson curves are expressed in 
terms of al and S, where 5 = (2 — 3ajj — 6 )/( 0:4 + 3). Values of al and 5 

were computed for F(x ) = 1 — (1 + x LI ‘)~ llk in which c and fc were assigned 
the values listed in the a 3 , a. table. The dotted area superimposed on the Craig 
chart is that covered by these al , & values for F(x) Although it is small m size 
compared to the total area, it contains a part of the areas representing the three 
main Pearson curves, I, IV, and VI, as well as the point for the normal curve 
and part of the line on which lie the points corresponding to the bell-shaped 
curves of the Type III functions. It also includes transitional Types V and VII. 
Thus the function F(x) covers part of an important area on the al, S chart for 
the Pearson curves. 

The function F(x) was used to graduate satisfactorily several observed dis¬ 
tributions classified as Pearson types, including the three mam Types, I, IV, and 
VI, and transitional Types III and VII. 

One advantage in the use of this cumulative function F(x) is that it takes but 
one symbolic form with the area covered, whereas the Pearson-system curves 
require several different expressions of various complexity requiring identification 
of type Furthermore, graduation by a Pearson function generally involves 
approximate integration or heavy interpolation in the incomplete beta function 
tables for the evaluation of the integrals of the Pearson functions, whereas 
graduation by a function Fix) is easily and quickly performed since Fix) only 
involves two number-parameters readily determined by means of the a 3 , <** 
table and straight, arithmetic. 


The writer is deeply indebted to Professor Irving W. Burr of Purdue Uni¬ 
versity for valuable suggestions in this study. 

REFERENCES 

[1] I. W. Burr, “Cumulative frequency functions,” Annals of Malh. Stat., Vol. 13 (1942), 

pp. 215-232. 

[2] C. C. Craig, “A new exposition and chart for the Pearson system of frequency curves,” 

Annals of Malh. Stal , Vol 7 (1936), pp 16-28 



ABSTRACTS OF PAPERS 


(Presented at the Berkeley Meeting of the Institute, Juno 16-18, 1949) 

1. Extension of a Theorem of Blackwell. E. W. Baihnkin", University of Cali¬ 
fornia, Berkeley. 

It is proved that Blackwell’s method of uniformly improving the variance of an un¬ 
biased estimate by taking the conditional expectation with respect to a sufficient statistic, 
is, in fact, similarly effective on every absolute central moment of order s 1. The method 
leads to finer detail concerning the relationship between an estimate and its thus derived 
one. (This paper was prepared with the partial support of the Office of Naval Research.) 

2. On the Existence of Consistent Tests. Agnes Berger, Columbia University., 
New York. 

Let 501(50) denote the space of all prohabihty-measures defined over a common Borcl- 
field SB. Let [m] =» ill, (ni'| =* ,1/' be two disjoint subsets of 5D2(S3) and lot J/o (Hi) be the 
hypothesis stating that the unknown distribution, is in M 01'). In Neyman’s terminology 
Ho can be consistently tested against H i if to any preassigned e > 0 there exists an integer 
n and a critical region in the product-space of n independent observations such that tho 
probabilities of the errors of the first and second kind corresponding to this region are 
simultaneously smallei than e. A sufficient condition which fur a certain type of consistent 
tost is also necessary is established. The condition is satisfied whenever the disjoint sets 
ill and M' are closed and compact with respect to a certain suitable topology introduced 
on 302(93). Thus for instance II a can lie consistently tested agaiiiHt Ih if .1/ and M' contain 
only a finite number of measures or if the measures in .V reap. A/' depend continuously on 
a parameter ranging over a closed and bounded Bubsot of some Euclidean space. 

3. Effect of Linear Truncation in a Multinormal Population. Z. William Birn- 
baum, University of Washington, Seattle. 

Let (X, Y i, , Tn-i) havo a non-singular n-diincnsional normal probability 

density f(X, Fi , F*, ■ , K n -i) for which all parameters are given, and let v(X, Li , Yt , 

• • 1 Tji-i) be the probability density obtained from / by truncation along a given hyper- 
plane <p = CJ for ail 7 ! + • • • + On-iEn-i g aX + &,¥>=• 0 elsewhere. What is the marginal 
distribution of X for this truncated distribution? This question can bo answered by using 
a set of tables with only two parameters. These tables make it also possible to solve prob¬ 
lems such as: determine the plane of truncation so that the marginal distribution of X has 
certain required properties. (This paper was prepurod under the sponsorship of the Office 
of Naval Research.) 

4. Statistical Problems in the Theory of Counters. (Preliminary Report). Colin 
R. Blvtii, University of California, Berkeley. 

Tho assumptions made about counter action and distiibuLion of incident particles are 
the same as those of B, V. Gnedenko [On tho theory of Geiger-Mtiller counters, Journ. Ex- 
per 1 Tear. Phiz, Vol 11 (1941)], The distribution of the number X of particles registered 
during a given time (0, t) is found explicitly, in terms of the density a(v ) of incident par¬ 
ticles at time v. The problem considered is that of estimating the parameters of a(w) For 
the special case ci(b) = o, the distiibution of X reduces to P[X = x) = a*(f — Xr) x exp 

464 



ABSTRACTS OF PAPERS 


465 


1 ~a(t - xt)]/x< + exp j-o« - xr))zf_J a'[i - xt]'/i< - exp{-a[£ - (x - l)r]}2flS a>[£ - 
(x - 1)t]‘/i\ for x = 1,2, • ■ , s = = 0| = e-°';P{X = s + 1) = 1 - cxp(-a(£ - 

a 

sr)} y^a‘[< — sr]‘/tl | Fix > s + 1} =0 This distribution has been found m another 

t-0 

problem by J. Neyman [On the problem of estimating the number of schools of fish, submitted 
to Statistical Series, Umy of Calif press], For this special ease the maximum likelihood 
estimate 4 of a is found to be given by At exp (dr) = {1 + r/{t — x t)}*xt/(£ — xr). If 
r/(t — xr) is small, as will usually be the ease, A will be close to the estimate x/(£ — xr) 

usually used for a. 

5 Some Two-Sample Tests. Douglas G Chapman, University of California, 
Berkeley 

Let if, Y be random variables normally distributed with means £, i;, variances a, , ai 
respectively. The two sample procedure formulated by Stein to obtain a test with power 
independent of <r, for the hypothesis tj — fo is used here to deteimine a test for the hypothesis 

- = r (r any pre-assigned real number). The size and power of this test are independent of 

V 

<ri and ai . The two sample procedure may be extended to the more general case of testing 
the hypothesis of equality of means of several normal populations, the variances being 
unknown. Approximate tests are obtained for this case Finally it is shown that this two 
sample procedure can be used to select that normal population, of several, with the greatest 
mean the rule of selection having a preassigned level of accuracy (This paper was pre¬ 
pared with the partial support of the Office of Naval Research ) 

6. Minimum Variance in Non-Regular Estimation. R. C. Davis, U. S. Naval 
Ordnance Test Station, Inyokern. 

The Cram^r-Rao inequality for the minimum variance of a regular estimate of an un¬ 
known parameter of a probability distribution is extended to a broad class of non-regular 
types of estimation. The theory is developed only for the case in which a probability den¬ 
sity function and a sufficient statistic for the unknown parameter exist For every non¬ 
regular estimation problem included in the above class, it is proved that there exists a 
unique unbiased estimate which attains minimum variance, and a method is given for 
obtaining the sample estimate. Examples are given; such as, the rectangular distribution, 
a class of truncated distributions, etc. 

7 Auxiliary Random Variables. Mark W. Eudey, California Municipal Statis¬ 
tics, Inc., San Francisco 

In testing hypotheses concerning discontinuous random variables it is not possible to 
find regions of arbitrary size, and so if we compare two critical regions, selection between 
them on the basis of the usual criteria of the Neyman-Pearson theory of testing hypotheses 
may be confused by tile difference in their sizes. This difficulty may be avoided by allowing 
the statistician to use a mixed strategy m such cases, and make his decision to accept or 
reject the hypothesis depend upon an independent auxiliary random variable. For example, 
if K is a binomial variable, and U has a uniform distribution (0, 1;, then Z =» K 4- U may 
be used t.o test hypotheses concerning the binomial parameter, and regions of any size may 
be found. For the binomial case this procedure leads to a class of uniformly most powerful 
tests for one-sided alternatives, and to uniformly most powerful unbiased tests for two- 



466 


ABSTRACTS OF PAPERS 


sided alternatives. Similar results are obtained for other common discontinuous variables, 
and the same device may be used in considering confidence regions and decision functions 
for such variables. (This paper was prepared with the partial support of the Offioe of 
Navel Research.) 


8. Estimation in Truncated Samples. Max Halperin, The Hand Corporation. 
Santa Monica, California. 

A death process is considered which starts with n Individuals of zero age, each following 
the mortality law,/(x, 9 ). That is, 


F(0 = Pr (Age at death < i] 



dx, 


where/(x, 6) is a probability density. We suppose we truncate the process at a fixed time, 
T, and wish to estimate 9 when 

a) individuals who die are not replaced, and 

b) individuals who die are replaced by individuals of zero age following the mortality 
law,/(x, 9). 

In both cascB, it is found that, under mild conditions, estimation by Maximum Likeli¬ 
hood gives optimum ostimatos. The estimates arc best in the sense of being asymptotically 
normally distributed and of minimum variance for large samples. 

The proofs are given for the ease of a single parameter, but can bo extended to the multi¬ 
parameter case. Examples aro given. 


9. Some Problems in Point Estimation. J. L. Hodges, Jr. and E. L. Lehmann, 
University of California, Berkeley. 

Some point estimation problems aro considered in the light of Wald’s general theory. It 
is Bhown that when tho loss function is convex, one may restrict consideration to nonran- 
domized estimates based on sufficient statistics. Minimax ostimatos aro obtained in a 
number of cases connected with the binomial and hypergeoinetrio distributions, and with 
some non-paramotnc problems. Some prediction problems are also considered. (This paper 
was prepared with the partial support of the Office of Naval Itesoarch.) 


10, Completeness in the Sequential Case. E. L, Lehmann and C. Stein, Uni¬ 
versity of California, Berkeley. 

Recently, in a series of papers, Girsliick, Mostollcr, Savago and Wolfowitz have con¬ 
sidered the uniqueness of unbiased estimates depending only on an appropriate sufficient 
statistic for sequential sampling schemes of binomial variables. A comploto solution was 
obtained under the restriction to bounded estimates. This work, which has immediate 
consequences with respeot to the existence of unbiased estimates with uniformly minimum 
variance, is extended here in two directions. A general necessary condition for uniqueness 
is found, and this is applied to obtain a comploto solution of tho uniqueness problem when 
the random variables have a Poisson or reotangular distribution. Necessary and sufficient 
conditions arc also found in tho binomial oase without tho restriction to bounded estimates. 
This permits the statement of a somewhat stronger optimum property for the estimates, 
and is applicable to the estimation of unbounded funotions of the unknown probability. / 

11. The Ratio of Ranges. Richard F. Link, University of Oregon, Eugene. 

The distribution of the ratio of two ranges from independent samples drawn from a 
normal population is given analytically for nj and n a ^ 3. A table of percentage values, R, 



abstracts op papers 


467 


is given for a — .005, .01, 025, .05, .10 and for all combinations of m and nj up to 10, where 
a = Pr (wi/wi > R) and Wi and io 2 are the observed ranges. (This paper was prepared under 
the sponsorship of the Offioe of Naval Research ) 


12. Some Problems Arising in Plant Selection and the Use of Analysis of 
Variance. Stanley W. Nash, University of California, Berkeley. 

The yields of many ( m ) varieties are compared in a 9eld trial. A few varieties having the 
highest and lowest yields in this trial are selected for further testing What chance is there 
that the first trial will give a significant result, the second trial not? Let {, denote the true 
mean yield of the vth variety, and assume that the f, are themselves normally, independently 
distributed with variance <r\. Let Pt, (k => 1,2) denote the probability of a significant result 
in the fcth trial, using the F-test. For fixed a\ > 0, lim m _, x Pi — 1. (See Nash, Annals of 
Math. Stat , Vol. 19 (1948), p. 434.) Now let trf > 0 take on a decreasing sequence of values 

1 / 23(F)\ a 

as m increases If - r — = 0 (- 1 ), then lim„_ M Pi =» 1. Here 1 + o\g(m) — 

g{m) \ <T„ f 

^numerator of F) ^ S o lim»-,_ Pi < 1 if and only if o’ = 0^^===^. For <rj 


(= error variance) 


= 0 ( = - ) ,lim m _„P 2 = a, the level of significance used. Thus, corresponding to any 

\Vlog mf 

m, however large, one can find values of a\ for which the chances are considerable (or even 
approaching 1 — a) , that the two field trials will give opposite conclusions when the F- 
test is vised. 


13 Asymptotic Properties of the Wald-Wolfowitz Test of Randomness. Gott¬ 
fried E. Noether, Columbia University, New York. 

Let 01 , • • , a„ be observations on the chance variables Xi , ■ ,X„ . Wald and Wolfo- 

witz (Annals of Math. Slat., Vol. 14 (1943), pp. 378-388) have shown how the statistic fit, => 
2S*_, x,x,+h , (z n+ j => x,), can be used to test the null hypothesis that the X{ , (t =* 1, ■ ■ ■ , 
n ), are independently and identically distributed by considering the distribution of Rk in 
the subpopulation of all permutations of the o, . In the present paper it is shown that when 
the null hypothesis is true this distribution of Rk is asymptotically normal provided 
(a, — 'I) r /[E?_] (oi — a) 2 ] r,J = oi» < * -r, /*] > (r - 3,4, • ■ ), a condition whioh is satisfied 
with probability 1 if the o, are independent observations on the same chance variable X 
having positive variance and a finite absolute moment of order 4 + 6, (5 > 0). Conditions 
are given for the consistency of the test based on Rk when under the alternative hypothesis 
observations are drawn independently from changing populations In particular a down¬ 
ward trend and a regular cyclical movement are considered, both for ranks and original 
observations. For the special case of a regular cyclical movement of known length the 
asymptotic relative efficiency of the rank test with respeot to the test performed on original 
observations is found. It is shown that when using ranks, Rk is asymptotically normal 
under the alternative hypothesis provided lim mf„_.„ var(n~ 6/! R») > 0. This asymptotic 
normality of Rk is used to compare the asymptotic power of the fittest with that of the 
Mann T-teat ( Econometrica , Vol. 13 (1945), pp. 245-259) for the case of a downward trend. 

14. On the Si milar Regions of a Class of Distributions. Stefan Peters, Univer¬ 
sity of California, Berkeley. 

The class of distributions considered is essentially the class of those distributions of n 
variables which, by a suitable transformation of the variables and the parameter, can be 
transformed into distributions defined in the whole R n for which the parameter is a location 



468 


ABSTRACTS OK PAPERS 


parameter. These regions satisfy a certain partial differential equation. The transformed 
distributions of the variables Vi , i/„ and parameter r possess a class Di of similar 

regions with respect to t which can be defined as the smalleBl additive class of regions 
which includes all regions defined by 

ff[(l/i - V«), , (Vn + ’Jn )) £ C 

where g is a continuous function. The class does not exhaust, all similar regions. There 
exists among the regions of class D t one which is most powerful for testing a given addi¬ 
tional parameter a If there exists among all similar regions a most powerful region lor 
testing a, then that region will bo the most powerful region of class D\ 

15. Some Problems in Sequential Analysis. Charles M. Stein, University of 
California, Berkeley. 

Wald’s fundamental identity foi cumulative sums is extended to dependent random 
variables The first derivative of this at the origin is equivalent to a result of Wolfowitz 
(Annals of Math. Slat., Vol. 18 (10-17), p, 228, Th. 7.4). Higher derivatives of this at the 
origin can,also be obtained from linear combinations of Wolfowitz’s result applied to suitable 
products of the original random variables. These equations yield approximate OC and ASN 
curves for probability-ratio testa for a simple hypothesis against, a single alternative con¬ 
cerning some of the more usual stationary Markoff chains Bounds for the amount by 
which the ASN exceods that of the most efficient test aro also obtained. The results are 
applied in particular to random variables taking on only the values 0,1 with conditional 
probabilities depending only on a finite number of tho preceding observations. The case 
of linear dependence of normal random variablos with fixed conditional variance is also 
considered, 

16. Some Aspects of Links Between Prediction Problems and Problems of 
Statistical Estimation. Erling Sverdrup, University of Oslo. 

A prediction is not takon as a probability statement about additional observations of 
the random variable alroady observed. It is presumed that the statistical interpretation 
of the sample will result in some action influencing the random variable subject to predic¬ 
tion. The probability distribution of this random variable is given for each of an a priori 
class of probability functions for the observed random variable and for each of a claBs of 
possible actions. “Utility” as a function of tho random variable to be predicted and of the 
action is defined. It is shown that the problem of which action to take in order to maximize 
expected utility is identical with a problem of statistical inference with a uniquely defined 
weight function in the Wald sense. It is further shown that this procedure is adaptable to 
stochastic processes of a general type and this provides a means of connecting the theory 
of stochastic processes with the theory of statistical inference. Some oxamples are given to 
illustrate the general theory. 

17. Some Large Sample Tests for the Median, John E. Walsh, The Rand 
Corporation, Santa Monica, California. 

Consider a large number of independent observations from continuous populations with 
a common median. Sone non-parametric large sample tests for the population median are 
presented which are based on either two or three order statistics of the sample If all the 
populations are symmetrical, these tests are equal-tailed with specified significance level a. 
If the observations are a sample from a normal population, these tests have high power 
efficiencies. Some tests based on three order statistics are developed which also have signifi- 



ABSTRACTS OF PAPERS 


469 


cance level a if all the populations are not symmetrical; however, in thiB case the resulting 
test is one-tailed instead of equal-tailed. Using these tests for situations where the popula¬ 
tions are believed to be symmetrical furnishes a safety factor with respect to Type I error. 
Tests are presented for the special case where each population is either symmetrical or 
skewed in a specified direction, If the populations are not symmetrical the significance 
level distribution is Aa to one tail and ,6a to the other, rather than ,5a to each tail. Also 
some non-parametric large sample testa of whether a sample is from a symmetrical popula¬ 
tion are derived, These tests are based on three order statistics of the sample and have 
bounded significance levels. 

18 . Continuous Sampling Plans from the Risk Point of View. Zivia S. Wurtele, 
Stanford University, California. 

The quality of a lot can be improved by a sereemng process whereby the defective items 
found during inspection are replaced by non-defective items The type of sampling plan 
adopted will generally depend upon the cost of inspecting items, the number of defective 
items in the lot prior to inspection, and the loss due to defective items remaining m the lot 
after inspection. The loss if the lot is accepted after d defectives are found in a sample of 
n items is equal to c(n) + /i(D) where D is the number of defectives left in the lot and c(n) 
is the cost of inspecting n items. An inspection procedure S is defined by a set of stopping 
points ((<J, n) j. Let r(p, 5) be the expected loss if p is the probability of a defective and 
the procedure S is used It is assumed that the lot is obtained from a binomial population 
For any a priori distribution F(p), a Bayes procedure is one which minimizes the expected 
risk, 

i 

r(P,S)dF(v). 

A systematic method of obtaining Bayes solutions exists, but the computations are formid¬ 
able. Under fairly general conditions the Bayes solutions are shown to be multiple sampling 
plans, in which the size of the ith sample depends upon the number of defectives in the 
(i - l)st sample, In particular, if the production is m a state of statistical control, a 
Bayes solution is a fixed sample size It is also shown that for most reasonable loss func¬ 
tions, there exists no mini-max procedure which is uniformly better than the trivial one, 
namely, the Bayes proceduie if p = 1. 


I 



NEWS AND NOTICES 

Readers are inviled to submit to the Secretary of the Institute news items of interest 

Personal Items 

Dr. Irving Burr has been promoted to a full professorship at Purdue University. 

Dr. D, A. S. Fraser, who received his Ph.D. degree at Princeton University in 
June, has accepted a position as Instructor of Mathematics at the University 
of Toronto. 

Dr. H, K. Hartline, formerly at the Johnsen Research Foundation of the Uni¬ 
versity of Pennsylvania, has accepted an appointment as chairman of the Thomas 
Jenkins Department of Biophysics, Johns Hopkins University. 

Dr. Leo Katz has been promoted to an associate professorship in the Mathe¬ 
matics Department of Michigan State College, East Lansing, Michigan. 

Professor D. D, Kosambi of Tata Institute for Fundamental Research, Bom¬ 
bay, India served as Visiting Professor at the University of Chicago for the 
Winter Quarter. 

Dr. H. G. Landau has resigned his position with the Ballistic Research Labora¬ 
tories and is now a Research Associate with the Committee on Mathematical 
Biology, at the University of Chicago. 

Mr, Allen L. Mayerson, formerly an Associate in the Division of Statistics 
and Research of the Institute of Life Insurance at New York City, has accepted 
a position with the National Surety Corporation of New York. 

Mr. Raymond P. Peterson, who has been an Assistant in the Mathematics 
Department of the University of California at Los Angeles and also a graduate 
student there, has accepted a position with the Institute for Numerical Analysis 
at Los Angeles. t 

Professor Edwin J. G. Pitman has returned to the Mathematics Department 
of the University of Tasmania after spending about a year and a half in the 
United States. From February to June of 1948 he was at Columbia University 
as visiting Professor of Mathematical Statistics, The rest of the time was spent 
at North Carolina and Princeton, 

A.‘Ananthapadmanabha Rau has returned to India after studying at the 
Statistical Laboratory in Ames, Iowa, In addition to heading the Department 
of Statistics and Agriculture Meterology of the Government of the State of 
Mysore, India, he is working on sampling design of experiments, and climatology 
and teaching statistics and climatology at the College of Agriculture. 

Dr. Andrew Sobczyk of Watson Laboratories has been appointed to an 
assistant professorship at Boston University. 

Assistant Professor S. L, Thompson of Alabama Polytechnic Institute has 
been promoted to an associate professorship. 

William J. Youden is acting as Assistant Chief of the Statistical Engineering 
Section of the National Bureau of Standards and as special advisor to the Direc¬ 
tor on the problems of statistical and mathematical design of major experiments 
in physics, chemistry and engineering. 


470 



NEWS AND NOTICES 


471 


Two Doctorates in Mathematical Statistics were awarded at the University of 
North Carolina in June, 1949. The recipients were Uttam Chand, who has now 
been appointed Assistant Professor of Mathematics at Boston University, and 
Ralph A. Bradley, who will be Assistant Professor of Mathematics at McGill 
University. 


The Educational Testing Service, Princeton, N. J., announces the appointment 
of Elbert Lee Hoffman and William Edward Kline as ETS Psychometric Fellows 
for 1949-50 for graduate study in psychology at Princeton University. Mr. 
Hoffman is a graduate of the University of Oklahoma, and Mr. Kline has received 
both his bachelor’s and master’s degree from Yale University. Bert F. Green, Jr. 
and Warren S. Torgerson have received reappointments as ETS Psychometric 
Fellows. Each Fellow carries a full program of graduate Btudy in psychology at 
Princeton University, including basic work in experimental and theoretical 
psychology. Special training is also given in mathematical statistics and modern 
quantitative methods as applied to psychological problems in such fields as 
learning, testing and attitude measurement, as well as in the techniques of 
developing aptitude and achievement tests. In addition to the graduate program 
in psychology, each Fellow spends part-time in training and research work with 
the Educational Testing Service. 


Preliminary Actuarial Examinations 
Prize Awards 

The winners of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of Part 2 of the 1949 Pre¬ 
liminary Actuarial Examinations are as follows: 

First Prize of tSOO 

Moran, Joseph W .... Yale University 


Additional Prizes of tlOO 

Farmer, Thurston P., Jr.. 
Haakenstad, Dale L .. . 
Hauke, William V . . 
Lordan, Joseph D. . 
Mayberry, John P, 
Murch, Alan D . 

White, William A 
Zemach, Ariel. . 


State University of Iowa 
University of Michigan 
University of Michigan 
Massachusetts Institute of Technology 
University of Toronto 
University of Toronto 
Dartmouth College 
Harvard University 


The Society of Actuaries has authorized a similar set of nine prize awards 
for the 1950 Examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three 
examinations: 

Part 1. Language Aptitude Examination. 

(Reading comprehension, meaning of words and word lelationships, antonyms, 
and verbal reasoning ) 




472 


NEWS AND NOTICES 


Part 2, General Mathematics Examination. 

(Algebra, trigonometry, coordinate geometry, differential and integral calculus,) 


Part 3. Special Mathematics Examination. 

(Finite differences, probability titnl staliRtira.j 


The 1950 Preliminary Actuarial Examinations will he administered by the 
Educational Testing Service at centers throughout the United States and 
■Canada on May 19, 1950. The closing dale for applications is March 15, 1950, 
Detailed information concerning the Examinations cun he obtained from: 

The Society of Actuaries 
208 South LaSalle SLreet 
Chicago 4, Illinois 


New Members 

The following persons have been elected lo membership in the Institute 
(March 1, 1049 to May 31, 1040) 

Alcantara de Oliveira, Eduardo, Ph.I)., (Umv do Sno Paulo) Professor, Faculdatle do Filo- 
aofia, University of 8ao Paulo, Rua Sergipe, 00-Ap. 8£, Sao Paulo, Brazil. 

Ashby, Wallace L, A B. (George Washington Univ.) Agricultural Statistician, 37J,6 Jocelyn 
Sheet, Washington 15, 0, C. 

Bailey, Edward W,, B.Ch (Ohio State Univ.! Quality Control Supervisor, Carbide and 
Carbon Chemicals Corporation, Y-12 Plant, lOi May Inn Amir, Oak Bulge, Tennessee. 

Berger, Agnes P., Ph.I). (Budapest) 10 Park Avcnua, New York, New York. 

Brown, Walter C., B S, (Colorado A AM College) Graduate Assistant, Department of Mathe¬ 
matics, University of Oklahoma, 1180 Trout, Norman, Oklahoma. 

Calvin, Lyle D., B.S (Umv. of Chicago) Research Graduate Assistant, Institute of Statistics, 
.North Carolina State College, Raleigh, North Carolina 

Carlyle, Charles G., B.S. (Univ, of Illinois) Graduate student at University of Illinois, 
C-SS Stadium Terrace, Champaign, Illinois. 

Chen, Yu-nlen, M.A. (Harvard) Graduate student, Harvard University, l/fl-60, Apt. D, 
Charier Road, Jamaica %, New York. 

Clark, Fred J., Jr., B.S. (Colorado A AM College) Gradirato Assistant at University of 
Illinois, Department of Mathematics, <>1 A Court 0, Stadium Terrace, Champaign, 
Illinois. 

Cohen, Samuel E., M.A. (Univ. of Pennsylvania) Statistician, U. S. Bureau of Labor Sta¬ 
tistics, J,9 Galveston St., 5.IF., Washing Urn SO, D. C. 

Cole, Randal H„ Ph.D, (Univ. of Wisconsin) Associate Professor, University of Wostern 
Ontario, London, Canada, 

Comrey, Andrew L,, Ph.D. (Umv. of Southern Calif.) Assistant Professor of Psychology, 
University of Illinois, Urbana, Illinois. 

Cook, Ellsworth B., B.S. (Springfield College) Head of Visual Screening Devices Research 
and Statistics Facility, U. S. Naval Medical Research Laboratory, Box Ifi, Submarine 
Base, New London, Connecticut. 

Cox, David R., Ph.D. (Leods, England) Statistician, Wool Industries Research Association, 
Z Sunset Avenue, Leeds S, Yoiks, England. 

Denbow, Carl H., Ph.D. (Umv. of Chicago) Associate Professor of Mathematics, U, S 
Naval Postgraduate School, Annapolis, Maryland, 

Dillon, Gregory M., A B. (Long Island Umv ) Statistician, Pension Statistics Section, 



NEWS AND NOTICES 


473 


Treasury Department, E I DuPont de Nemours & Co ,1 SSI Cedar Street, Wilmington, 
Delaware, 

Duarte, Geraldo Garcia, Liceneiado em Matematica (Faculdade de Filosofia de S Bento) 
Assistente da Faculdate de Higiene e Saude Publioa, Caixa Postal 99B, Sao Paulo, 
Brazil. 

Dudman, John A., B A. (Reed College) Graduate student, Columbia University, 56 West 
70th Si,, New York 38, New York, 

Edelson, Howard, B.A (Ohio State Umv , Columbus, Ohio) Graduate student and Graduate 
Assistant, Ohio State University, 7 94 S 18th St , Columbus 6, Ohio. 

Feron, R., Licencie es Sciences, (Umv of Pans) Attache de Recherce, 13 rue de s Feuillan- 
tines, Paris V, France 

Franck, Edward Michel, Inpineur A.I A., Professor of the Royal Military School, 104 Rue 
Pere Devroye, Woluwe St. Pierre, Belgium. 

Garrltsen, Florence M., BA (Univ of Michigan) Research Assistant, General Motors 
Corp., 6151 Lillihridge Ave , Detroit 18, Michigan. 

Gelsominl, Thea, Ph D (Umv. of Bocconi, Milano) Assistician of Statistics at Department 
of Statistics, University of Bocconi, Via A Stoppi, N 10, Milano, Italy. 
Goudswaazd, G, Ph D (Umv of Leiden) Director, Permanent Office, International Sta¬ 
tistical Institute and Lecturer of Statistics, Rotterdam School of Economics and Free 
University of Amsterdam, 2 Oostdumlaan, The Hague, Netherlands. 

Gucker, Frank Fulton, A B (Harvard Umv ) Statistical Engineer, Remington Arms Co , 
Inc , 8175 Mam Street, Bridgeport 6, Connecticut. 

Haberman, Sol, B.A. (Brooklyn College) Assistant Visiting Professor of Sociology, Univer¬ 
sity of Puerto Rico, 187 Avcnida los Flamboyanes, Rio Piedras, Puerto Rico 
Helmbach, Ernest E., M.B A (New York Umv ) Professor of Economics, Bergen College, 
Teaneck, New Jersey, 56 West 11th Street, New York 11, N. Y 
Ishll, Shlgeru, B,A, (Univ of Ill ) Student at University of Illinois, 820-1 Peabody Drive, 
Parade Ground Units, Champaign, Illinois 

Jackson, James Edward, M.A (Univ of N C ) Statistician, Color Control Dept > Eastman 
Kodak Company, 200 Pershing Drive, Rochester, New York. 

Jaspen, Nathan, Ph I) (Pennsylvania State College) Research Assistant, Department of 
Psychology, Pennsylvania State College, State College, Pennsylvania 
Jonhagen, Sven, Fil Lie. (Umv. of Stockholm) Chief Actuary and Assistant Teacher in 
Statistics at the University of Stockholm, Tegnergatan 86, Stockholm, Sweden 
Kiefer, Jack C., M.S (Mass Inst, of Tech.) Student, Department of Mathematical Sta¬ 
tistics, Columbia University, 3826 Middleton Avenue, Cincinnati 20, Ohio. 

Kraft, Charles Hall, B A. (Mich. State College) Instructor, Mathematics Department, 
Michigan State College, 707D Chestnut Road, East Lansing, Mich 
Mayne, John W., M Sc (Brown Umv.) Graduate student in Mathematical Statistics, 330 
Furnald Hall, Columbia University, New York 27, New York. 

McCabe, William J., M.A. (George Washington Univ.) Chief Statistician, Transportation 
Corps, Department of the Army, 1725 South Oakland Street, Arlington, Virginia. 
Medln, Knut H,, M A (Univ of Uppsala) Assistant, Statistical Institute, University of 
Uppsala, Odinslund 2, Uppsala, Sweden, 

Mewborn, A. Boyd, Ph.D. (Calif Inst of Tech.) Associate Professor of Mathematics and 
Mechanics, P.O. Box 1748, Monterey, California. 

Minton, Paul D., M S (Southern Methodist Umv.) Graduate student, University of North 
Carolina, P O Box 634, Chapel Hill, North Carolina 
Morris, Doris N., M A. (Columbia Teachers College) Economics Assistant, Western Electric 
Co , 101 West 72nd Si., New York 23, New York 
Morris, Robert H., B A (Swarthmore) Development Engineer, Color Control Department, 
Eastman Kodak Co , Rochester, New York. 





NEWS AND NOTICES 


Rajalakshraan, D, V„ M,So. (Madraa Univ,) Head, Department of Statistics, University 
of Madras, Madras 6, S- India, 

Rudy, Norman, M.B.A. (Univ. of Chicago) Scientist, Ordnance Research Project, Univer¬ 
sity of Chicago and Instructor in Economics, Roosevelt College, 7105 S, Crrndor t 
Aeeitus, Chicago 16, Illinois, 

Sakk, Kaarel, Fil.kand. (Univ. of Stockholm) Officer at Research Bureau of the State Food¬ 
stuffs Commission, Ostermalmsgatan 67 o.g. Ill, Stockholm, Sweden. 

Singh, Jagjlt, B.A. (Punjab Univ.) Superintendent Transportation, E, I, Railway, Dinapore, 
c/o B,S, BuggaEstir,, Post Office Calcutta, India, 

Starr, Henry H., Ph.D, (Univ, of Vienna) Research Manager, Converted Rice, Ino, P,0, 
Box 1762, Houston 1, Texas, 

Sverdrup, Erllng, Aotuarian (Univ. of Oslo) Lecturer in Mathematical Statistics, Institute 
of Mathematics, University of Oslo, Oslo, Norway. 

Talacko, Joseph Y„ Ph.D. (Charles Univ., Prague) Assistant Professor of Mathematics, 
Marquette University, BS So, 10th Street, Milwaukee 7, Wisconsin. 

Templeton, James G, C,, A,M, (Princeton Uaiv.) Graduate student at Princeton Univer¬ 
sity, Fine Hall, Princeton University, Princeton, Now Jersey, 

Vaughan, Elizabeth, B S, (Univ, of Washington) Statistician, U,8. Fish and Wildlife Service, 
2725 Montlake Boulevard, Seattle 2, Washington. 

Wilkinson, Bryan, M.A, (Univ, of Nebraska) Personnel Research Specialist, Prudential 
Insurance Co., Western Home Office, 1/30-B Muirjield Road, Los Angeles IS, Cali ■ 
jomia, 

Yevlck, Mariam A, L., Ph,D, (Mass, Inst, of Tech,) Staff, Division of Statistical Engineer¬ 
ing, National Broadcasting System, 11/ Hudson Si,, Hoboken, Hew Jemg, 



REPORT ON THE BERKELEY MEETING OF THE INSTITUTE 


The thirty-ninth meeting and fifth regional West Coast meeting of the In¬ 
stitute of Mathematical Statistics was held on the Berkeley campus of the 
University of California, from Thursday June 16 through Saturday June 18, 
1949. The session on June 17 was held jointly with the Biometrics Section of the 
American Statistical Association and the Biometric Society (Western N. A. 
Region). Sixty-six persons registered, including the following fifty members of 
the Institute: 

Jane F. Andrian, G A Baker, Z. Win. Birnbaum, Colin R Blyth, Albert H Bowker, 
Paul T. Bruyere, Chin Long Chiang, Edwm L Crow, John H Curtiss, R C. Davis, Carl 
H. Denbow, W J Dixon, Mary Elveback, Mark Eudey, Edward A. Fay, Evelyn Fix, 
William R Gaffey, II H Germond, M. A Gnshick, Jack Gysbers, Max Halperin, J. L. 
Hodges, Jr , John M Howell, Harry M Hughes, Cuthbert Hurd, Terry A. Jeeves, Mark 
Kac, H.S Konijn, GeoigeM Kuzneta, Erich L Lehmann, Richaid F Link, Michel Loftve, 
Frank Massey, Lincoln E Moses, Edith Mouner, Stanley W Nash, J. Neyman, Edward 
Paulson, Stefan Peters, Raymond P Peterson, Robert I. Piper, Gladys Rappaport, Mina 
Rees, David Rubinstein, Elizabeth L. Scott, Esther Seiden, CharleB M. Stein, John E. 
Walsh, John Wishart, Zivia S Wurtele. 

Those attending were welcomed at the Thursday morning session by Edward 
W. Strong, Associate Dean of the College of Letters and Science, University of 
California. Professor Z. William Birnbaum of the University of Washington 
presided. 

The program was as follows: 

1. Recent advances in the theory of the Wishart distribution. (Invited paper.) 
John Wishart, Cambridge University. 

2 Bayes, mmimax, and other approaches to the multiple classification problem. 
(Invited paper.) M A. Girshick, Stanford University. 

3. Some problems in sequential analysis. Charles M. Stein, University of 
California, Berkeley. 

Professor Jerzy Neyman of the University of California, Berkeley, presided 
at the Thursday afternoon session. Midway in the program there was an inter¬ 
mission for a tea given by the Statistical Laboratory, University of California. 
The program was as follows: 

1 Completeness m the sequential case. E, L. Lehmann and C. M. Stein, University of 
California, Berkeley 

2, Some large sample tests for the median JohnE Walsh, The Rand Corporation. 

3 Continuous sampling plans from the risk point of view Zivia S Wurtele, Stanford 
University 

4 Some problems m point estimation J L Hodges, Jr and E L Lehmann, University 
of California, Berkeley 

5. Minimum variance in non-regular estimation R. C. Davis, U. S Naval Ordnance Test 
Station, Inyokern, 

6 Some aspects of links between prediction problems and problems of statistical estimation 
Erling Sverdrup, University of Oslo. 


475 



476 


REPORT ON BERKELEY MEETING 


7. Extension of a theorem of Blackwell. (By title). Edward W Rarankin, University of 
California, Berkeley 

8. Some two-sample tesla. (By title). Douglas G. Chapman, University of California, 
Berkeley. 

9. On the existence of consistent tests. (By title). Agnes Berger, Columbia University. 

Professor F. W. Weymouth of Stanford University presided tit the Friday 
morning session on biometries. The program was as follows: 

1 . Statistical problems arising from research in tuberculosis. Martha and Paul T. Bruyere, 
U. S Public Health Service. 

2. Correlation of variability with growth rale in fish and niollusks. F W. Weymouth, 
Stanford University. 

3. Some problems arising in plant selection and the use of analysis of variance Stanley 
W Nash, University of California, Berkeley. 

4. Studies of resistance of strawberry varieties and selections to vcrticillurn will It. E Baker 
and G. A. Baker, University of California, Davis, 

6 . A uniformity trial on unirrigaled barley of ten years duration with implications for 
field trial designs. F. J. Veihmeyor, M. R. Huberly, and G. A. Baker, University of 
California, Davis and Los Angeles. 

On Friday afternoon those attending the meeting were entertained at a picnic 
luncheon at Stanford University, given by the Department of Statistics, Stanford 
University. 

Professor C. B. Morrey, Jr., of the University of California, Berkeley, presided 
at the Saturday morning session. The program consisted of the following invited 
papers: 

1. Methods for getting limiting distributions. Mark Kac, Cornell University. 

2. Almost certain convergence. Michel Loflve, University of California, Berkeley 

At 11 o’clock Saturday morning a business session was held, under the chair¬ 
manship of Professor Jerzy Neyman of the University of California, Berkeley, 
for the purpose of discussing future West CoaBt meotings. Plans for reviving the 
Statistical Research Memoirs were also discussed. 

On Saturday afternoon a final session for contributed papers was held under 
the chairmanship of Professor Albert H. Bowker of Stanford University, The 
program was as follows: 

1. Effect of linear truncation m a multinomial population, 55, William Birnbaum, Univer¬ 
sity of Washington 

2. Estimation in truncated samples. Max Ilalporin, The Rand Corporation. 

3 On the similar regions of a class of distributions. Stofan Poters, University of California, 
Berkeley 

4, Auxiliary random variables, Mark W. Eudoy, California Municipal Statistics. 

5, The ratio of ranges Richard F, Link, University of Oregon. 

6, Statistical problems in the theory of Geiger counters. Colin R. Blyth, University of 
California, Berkeley. 

7, Asymptotic properties of the Wald-Wolfoivitz test of randomness. (By title). Gottfried 
E. Noether, Columbia University 

J. L. Hodges, Jr. 
Assistant Secretary 



LOCALLY BEST UNBIASED ESTIMATES 1 

By E. W. Barankdt 
University of California, Berkeley 

Summary. The problem of unbiased estimation, restricted only by the postu¬ 
late of section 2, is considered here. For a chosen number s > 1, an unbiased esti¬ 
mate of a function g on the parameter space, is said to be best at the parameter 
point 0o if its sth absolute central moment at 0o is finite and not greater than that 
for any other unbiased estimate. A necessary and sufficient condition is obtained 
for the existence of an unbiased estimate of g. Wien one exists, the best one is 
unique. A necessary and sufficient condition is given for the existence of only 
one unbiased estimate with finite sth absolute central moment. The sth absolute 
central moment at 0o of the best unbiased estimate (if it exists) is given explicitly 
in terms of only the function g and the probability densities. It is, to be more 
precise, specified as the l.u.b, of certain set (2 of numbers. The best estimate is 
then constructed (as a limit of a sequence of functions) with the use of only the 
data (relating to g and the densities) associated with any particular sequence 
in (2 which converges to the l.u.b. of (2. 

The case s = °° is considered apart. The case s = 2 is studied in greater 
detail. Previous results of several authors are discussed in the light of the present 
theory. Generalizations of some of these results are deduced. Some examples 
are given to illustrate the applications of the theory. 

1. Introduction. Let ft be a space of points x, and y be a totally additive 
measure defined on a <j -field 7 of subsets of ft. Let H3 = jp#, 0 e 0) be a family of 
probability densities in ft with respect to the measure y. 0 is any index set; we 
lay down no conditions on its structure. We are concerned here with the existence 
and characterization of unbiased estimates of a real-valued function g on 0, 
which are in some suitable sense “best" for a prescribed parameter point 0o. 
That is, a real-valued, measurable (y) function / 0 on ft such that 

( 1 ) i hVudy = gift), 0 <®, 

Jn 

and which satisfies a specified criterion of bestness for 0 = 9 a . This criterion is 
usually taken to be 

(2) f (fo - g{9 o)) 5 j) h dy ^ f(f ~ g(9o)ypt<> dy, f t 

Jq Jo 

where M. denotes the class of all unbiased estimates of g\ i.e., the class of all / 
satisfying (1). The obvious advantage in the definition (2) is the algebraic 

1 This article was prepared while the author was under contract with the Office of Naval 
Research. 


477 



478 


E. W. BARANKIN 


pliability. The obvious disadvantage is that M may contain no estimate with 
finite variance (cf. section 9). 

For the investigation of the fundamental questions, posed above, relating to 
unbiased estimates, we shall not restrict ourselves to (2). We consider chosen 
and fixed, a number s > 1, and lay down the 

Definition. fo t 5Ti is beat at 0 a if 

00 > f l/o — |' Pto dpi g [ | / - ff(0 0 ) p t „ dp, f e®. 

Ja "'ll 

With this, and under the condition of a rather natural postulate on 13 (cf. section 
2), we exhibit a necessary and sufficient condition for the existence of an unbiased 
estimate of g having a finite sth absolute central moment at 6t> . 5 

Except for the discussion, in section 3, of the case in which g is constant on ®, 
we do not consider directly the estimation of g, but rather that of h ~ g — g(O 0 ). 
Lemma 1, of section 2, gives the solution of the problem for g when that for h is 
known After section 3, it is assumed exclusively that h is not sat), except where 
the contrary is explicitly stated. 

In case s is finite, the existence theorem section 4, Theorem 2, asserts also the 
uniqueness of the best unbiased estimate of h. It is interesting to observe the 
similarity between the proof of this uniqueness and Fisher’s proof of the (what 
might be called) asymptotic uniqueness of an efficient estimator [2 pp. 704, 705], 
The case s - ra a is discussed in section 5; in this ease we find that, in general, 
the best estimate is not unique. However, for s both finite and infinite, and as 
well when g is constant (.'. h ss 0), we give a necessary and sufficient condition 
that there be a unique unbiased estimate with finite s.a.e.m. 4 (cf. section 4, 
Corollary 2-1, and section 5, Theorem 3 (iii)). 

Theorem 2 determines the s.a.e.m. of the best estimate as the l.u.b. of a set of 
numbers given explicitly; and thereby, in particular, throws open the class of 
all lower bounds of the minimum s.a.e.m. Investigations after such lower bounds, 
in the classical case s = 2, have led to the well-known results of Cramdr-Rao 
[3 p. 480, (32.3.3)], and Bhattacharyya [4, p. 3, (1.10)]. In section 0, which is 
devoted to obtaining various special lower bounds, we show how those particular 
bounds fall out. It should be lemaiked, however, that our conditions on 13 are 
in general different from those of the above authors. 

5 For the oaso a =■ 2 an alternative oxislonoo condition, antedating theso results, but not 
yet published, has boon obtained by C. Stein. 

1 If wo use, in the above definition, the sth root of the sth absolute central moment, 
instead of the latter itself, then the bestness criterion for s — w is tlie limiting criterion 
for s —* to ; viz,, 

“ > ess sup. |/o — g(0 0 ) | S ess. sup |/ — g(fio) \ , f t 5)i, 

*«n *<n 

where ess. sup. refeis to the measure v(A ) = / pio dp. 

4 The abbreviation s.a c m. will henceforth be used to indicate sth absolute central 
moment at 0o . 



UNBIASED ESTIMATES 


479 


In section 7 wc give, in Theorem 7 and its corollary, a construction of the 
best estimate, depending only on the knowledge of the minimum s.a.c.m. The 
latter, as indicated in the preceding paragraph, is always known independently 
of any knowledge of the best estimate. We use these results to obtain explicitly 
(Theorems 8 and 9) the best estimates, for arbitrary s, in two cases where 
we assume the minimum s.a.c.m. known. These eases, when s = 2, give the 
minimum variance as determined by the equality sign in the Cramdr-Rao and 
Bhattacharyya inequalities, respectively. 

Section 8 is given to a brief discussion of the special case s = 2. Finally, in 
section 9, we present a detailed study of an example 

At the suggestion of the referee we have added an appendix in which is given a 
brief running description of the fundamental ideas of Banach spaces that come 
into use here The italicized phrases are those mentioned explicitly in the course 
of the paper 

We shall merely mention here certain points which will be elaborated further 
in future communications. (1) The general theory developed here pertains as 
well to sequential as to nonsequential estimation; one has only to make the 
proper identification of 12, IF, /x, and ^3. Moreover, as applied to sequential 
estimation, the theory will determine the optimum stopping regions. (2) The 
discussion of section 5 below can be carried through with “ess. sup.” referring 
to the measure n, and Si being the space of functions on 12 which are integrable 
(/x); and for this, no restrictions whatsoever on the densities pe are required 
(cf. the postulate of section 2), since the p s are elements of this 8i solely by 
virtue of their properties as probability densities. This development would, for 
example, be sufficient to yield the estimate of Girshiek, Mosteller, and Savage [5] 
in the case of sequential binomial estimation. Also, this unrestricted analysis is 
fundamental for the problem of similar regions (a case of the bounded unbiased 
estimation of a constant function). (3) For any s > 1 it may be observed in the 
result of Theorem 7 below, that the best (at 0 O ) estimate depends only on a 
sufficient statistic, this is clear from Neyman’s theorem on sufficient statistics 
[6], since the best estimate depends only on ratios of the density functions pi 
But more than this, Blackwell’s method [7] of deriving a uniformly (over the 
parameter set) better unbiased estimate from a given unbiased estimate can be 
proved to remain valid also when the measure of dispersion is the sth absolute 
central moment, s > 1 And for this, the postulate of section 2 is not required. 
(4) Finally, we point out that, with the proper specializations of ©, Cram dr’s 
theorem on the ellipsoid of concentration [8], Bhattacharyya’s multidimensional 
inequality [9], and the extensions of the Rao, Cram dr, and Bhattacharyya 
bounds to sequential estimation—as, for example, by Blackwell and Girshiek 
[1], Wolfowitz [10], and Seth [20]—can be drawn from Theorem 4 below. 

The inspiration for the mode of analysis in the following pages, and the 
major part of its substance, come from F. Riesz: his book [11 Ch. Ill] and the 
article [12] (in particular sections 8-11 thereof) In strictly mathematical ter¬ 
minology, Theorems 2 and 3 are given in [II] for the sequence-spaces i, ; and 



480 


E. W. B ARAN KIN 


Theorem 2 in [12] for the spaces 8 r of functions on the real interval [0, 1] with 
Lebesgue integrable rth powers. The proofs are given there for the case of a 
denumerable set ©; in [12] an indication is given of the extension to a non- 
denumerable ©. Our proof of Theorems 2 and 3, however, follows that given by 
Banach [13, p. 74] for the case of denumerable 0. It is based on two results, a 
theorem of Hahn-Banach [13, p. 55, Theorem 4], and the representation theorem 
(suitable for the general type of 8 r that we consider) for bounded linear func¬ 
tionals on 5? r [14, p. 338, Theorem 46]. The first of these, and the representation 
theorem for any r > 1, spring in fact from the same article [12, p. 475] of ftieaz. 
In the case r = 1, the representation theorem is due originally to Steinh a us [15]; 
in the case r = 2, it was developed simultaneously in 1907 by Riesz [16] and 
Frdchet [17]. 

Riesz’ proofs of the sufficiency of the condition in Theorem 2 proceed by 
constructing an explicit sequence of functions on Q, which converge strongly in 
& to the (in the present statistical terminology) beBt estimate. Precisely, if in 
Theorem 7 below, we take, for each n = 1, 2, • • • , the numbers af ,«?,■• , «*„ 
so that the expression 

£«thw) 

i"i _ 

t 

»—l r 

is maximum, then the assertion of this theorem is that of Riesz. However, 
Theorem 7 is established here without this strict requirement on the a" . The 
dropping of this restriction was essential for the proofs of Theorems 8 and 9. 
The latter two theorems are, in fact, proved with the use of Corollary 7-1, 
which is an even stronger result than Theorem 7. This corollary falls out of the 
proof of Theorem 7 immediately, in consequence of our use of Lemma 2 for that 
proof. The lemma, moreover, eases the proof of Theorem 7 markedly, in doing 
away with the need for any differentiation. 

2. Preliminary considerations. We begin then by introducing the absolutely 
continuous (with respect to p) measure, defined on 7, 

r(A) = jf p« 0 dp, A e £F. 

A function 4> is Bummable (v) over SI if and only if <p • is summable (m) over 
£2; and we have 



(cf. [18, pp, 36-38]). Assuming that each of the ratios 


tj(x) = 


P>(x) 


91 0 



UNBIASED ESTIMATES 


481 


is defined almost everywhere (fi) throughout ft, it follows that / is an unbiased 
estimate of g if and only if 


(3) f fr, dv = g(8), 

* 0 

We define 


Since 


h(B) = g(B) - g(9 a ). 
f irt dv = 1, 


8 e 0. 


8 6 0 , 


it is clear from (3) that / is an unbiased estimate of g if and only if /. — g(8 a ) is 
an unbiased estimate of h. Moreover,/is best, for g, at 6» if and only if / — g(8 a ) 
is best, for h, at do . 

Define 


s 



and let 2 r and fi. be the spaces, normed in the usual way, of real-valued functions 
on ft, with summable 0) absolute rth and sth powers, respectively. We denote 
the respective norms by || ||, and || ||. ; that is, if £ e fi r and v 1 2., 

II f II. 

and 

ii’ll-- 

We note that these spaces, for s < «, are weakly compact (cf. [21]). This 
property will be used in the proof of Theorem 7. Also, we shall make explicit use 
of the representation theorem for linear functionals on fi r [14, p. 338, Theorem 46]. 
The assumptions on 1(3, or on 1|3g = [ire, 6 e ©}, will now be the following. 
Postulate: The functions wt are defined almost everywhere (fi) in ft, and are 
elements of 2 r . 

The foregoing considerations combine to give the following equivalence. 

Lemma 1. ^ + g(8f) is an unbiased estimate of g, which is best at 8 0 , if and 
only if ( i ) satisfies the equations 

(4) j dv = h(8 ), 8 e 0, 

and ( li ) when <j> is any other function satisfying (4), we have 



482 


E. W. DAIUNKIN 


that is, if and only if <j>a is an unbiased estimate of h with minimum (finite ) norm in 
8,. The s.a.c.m. of fa + g(8 0 ) is precisely || fa ||,. 

Starting with section 4, we shall deal directly with the estimation of h. 

3. The case of constant g. Throughout the remainder (section 4 et seq.) of 
this article, the function h is assumed, unless the contrary is explicitly stated, 
to be non-constant; that is, since h(&o) — 0, not 0. We can, and Bhall in this 
section, obtain the results of the desired kind for the case of a constant function g, 
by a brief, direct attack. 

Let g(8) = g 0 , a constant. Then of course h(0) sa 0, One unbiased estimate of g 
is immediately obvious, viz., fix) e g 0 . The s.a.c.m. of f is 0. 

There will exist other® unbiased estimates of g with finite s.a.c.m, if and 
only if there exist non-null unbiased estimates, in 8 ,, of 0 as h. That is, by virtue 
of the isomorphism between 8 , and the space of linear functionals on 8 r , there 
will exist an unbiased estimate of g with finite s.a.c.m., distinct from fi , if and 
only if there exists a non-null functional on 8 r which vanishes on the elements of 
To = {, 8 e 0). And a necessary and sufficient condition that such a functional 
exist is that To he not a fundamental set in 8 r [13, p. 58, Theorem 7). 

Observe finally that, in any case, f is the unique unbiased estimate of g with 
vanishing s.a.c.m. 

We collect these results in the following statement. 

Theorem 1, If g{9) a g a , a constant, then there is a unique best unbiased estimate 
of g\ viz., f{x) ss <j 0 . And the s.a.c.m. of f is 0. 

A necessary and sufficient condition that there exist no other unbiased estimates 
of g having finite s.a.c.m. is that the set To be fundamental in 8 r . 

As an illustration of the ideas of this section, consider the following example: 
n is the real interval [ 0 , 1]; y is Lebesgue measure; 0 is the set of non-negative 
integers; and 

p s (x) = (0 + l)x e . 

And take 8 0 — 0. Then, v is again Lebesgue measure, and ire = pt for each 6. 
For definiteness, take r = 2 (the results in this case are the same for any r 2: 1). 
It is well-known that the non-negative integer powers of a; form a fundamental 
set in 82 on a finite real interval, That is, if $ is a function on [0, 1], such that 

/ £ dx < co, and if e > 0 , then there exist an integer n and coefficients b 0 , 
Jo 


8 That is, distinot from/, in the sonse of 8, ; or, equivalently, differing from fi on a Bet 
of positive (v) measure. Whenever, in the sequel, an equation {1 = & appears, for two 
functions fi and £2 in 8 r or 8, , equality almost everywhere (v) in £2 will be understood. 
It is a consequence of our postulate that if two functions on £2 are equal almost everywhere 
Mi they are equal almost everywhere (»'), where v' is anyone of the measures v'(A) = 

I pt< dp, S' f 8 

•fi 



UNBIASED ESTIMATES 


483 


such that 


jf(«- 5-)‘ 


da: < e. 


Hence, in this case an unbiased estimate with finite variance at d = 0 is unique 
(as well for a non-constant function g as for one which is constant over ©; cf. 
section 4, Corollary 2-1). 


4. The main theorem for non-constant h. We shall denote by 9J? a the class 
(or the set m 2,) of all unbiased estimates of h that belong to 2,. 

Theorem 2. (i) A necessary and sufficient condition that 501, be non-empty is 
that there exist a constant C such that for every set of n functions irg, , , • , rg n , 

in iPo, and every set of n real numbers ai, a 2 , • - • , a n , we have, for every n — 1,2, 


(5) 


S < h(e x ) 


< c 


Oi ITS, 


(u) For every 0 < UI. we have || cf> ||, ^ C 0 , where Co is the g Lb. of the set of 
admissible constants Cm (5) 

(ill) If 9)?, is non-empty there is a unique <t> a e SDt, with || 0o ||. = Co. Thus, 
</>o is the unique unbiased estimate of h which is best at 6 0 . 

The non-constancy of h dearly implies Co > 0. 

The necessity of condition (5) is im m ediate. Suppose 0 e 9K,, so that 0 satisfies 
equations (4); then, for any 0i, 0 2 , ■ ■ • , 6 n , and any real numbers m , a 2 , • - • , a n , 


I] di h(df) 


f n , 

= U^o, 

J fi 


Wg, 


■dv. 


By the Holder inequality it follows that 


a;h(di) 


.-1 


^ II* II.- 


w i 

a* . 

i=l )r 


Hence (5) is satisfied with C = [| 0 ||». 

Part (u) of the theorem is hereby proved as well. 

Suppose 91?, non-empty, and 4>o, 4>i in 91?. , such that || 0o) U = II <0i ll« = C'o. 
Then 1/2 (0 O + 0i) e 91?. and therefore 

1/2 || 0o d - 0i 11« = Co. 


But, by the Minkowski inequality, 

1/2 || 0o + 0i ||. ^ 1/2 (|| 0o ||. + || 0i ||.) = Co, 


Hence 

|| 0o + 0i ||. = II <0o ||. + II <0i ll« • 


This equality implies 0i = a 0o for some positive a.. But since the norms of 4>o 
and 0i are equal (and 5^0) a must be unity Thus the uniqueness of 0o is proved. 



484 


E. W. BARANK1N 


It remains now to prove, assuming (5) satisfied, the existence of fa , Consider 
the functional F on iJJo defined by 

F(r>) = h[6). 

The Hahn-Banach theorem alluded to in section 1 (viz., [13, p. 55, Theorem 4]) 
has precisely (5) as a necessary and sufficient condition for the existence of a 
linear functional 0 on P r satisfying 

(a) 0(t>) = h(6), 6 t 0; 

(b) || G || S C; 

where || G || is the norm of 0, i.e., 

||(?|| = l.ub. 

{< 8 r ||f||r 

In particular, taking C — Co, there is a linear functional G 0 on S, with 

(aO GoM = h(9), 9 « © 

(b') || Go || S C„. 

w 

But, for an element o, mi in the linear manifold [i|3o] spanned by the 7 r«, 

Go (2 o< ir»i) = S ft h (0,)> 

i l 

so that 


Go || £: l.u.b, 


t GWf) 


Co 


{*P»ol Ilf ||r 

Hence (b') is replaced by the precise statement 

(b") || Go || = Co, 

Now the representation theorem for linear functionals on S r asserts the exist¬ 
ence of fa 1 1',, such that 


Go(() = f fa -£ dv, 
Ja 


and 


|| fa ||. = || Go || = Co, 

This taken with (a') establishes the existence of fa t S, satisfying 

| 6oir, dv = h{6), 
i Ja 

. II II* = Co • 

and this completes the proof of the theorem. 


Bt 0 



UNBIASED ESTIMATES 


485 


It is readily seen that 9ft, will consist of more than just fo if and only if there 
exists a non-null functional on 8 r which vanishes on ^ 0 . Our discussion in 
section 3 therefore enables us to assert the following. 

Corollary 2-1. 211,, when it is non-empty, consists of alone if and only if 
$0 is fundamental in ? r . 

A word is in order concerning the following two consequences of the bounded¬ 
ness of the measure v: (i) if iJJo C 8,, then also 3$o Cl 8,, for every r' < r; (ii) if 
4> e 2, then also 4> e for every s' < 3 . Otherwise stated: (i') if $0 satisfies the 
postulate of section 2 for the number r, it likewise satisfies this postulate for 
every (admissible) r‘ < r; (ii') if 971, is non-empty, then 21?,' is non-empty for 
every s' < s. Regarding (i') we shall make only the obvious remark that although 
2?o satisfies the postulate for every r' < r, there may be values of r' < r such 
that no C for (5) exists; this will be exemplified in section 9. Where (ii') is con¬ 
cerned, it is clear that the non-emptiness of 8ft, will not necessarily imply that 
23o C 8,for every s' < s, even though for every such s' 3ft,' is non-empty. 
If for every & e& other than do we have in 4 [V/.'-i , for some particular s' < s, 
then we may have the situation in which there are elements in 91?,' with norms 
arbitrarily close to 0. However, this cannot be the case if (a) for some 8 other 
than do, n t , and (b) h does not vanish identically on 0', the set of those 
8 for which m e 8,'/,'_i. For, when these two conditions are satisfied, Theorem 
2 applies to h as defined on consequently there is a positive lower bound 
for the s'—norms of the unbiased estimates of h over ©'. And since every ele¬ 
ment of 21?,' is, in particular, an unbiased estimate of h over 0', it follows that 
the norms of those elements are bounded below by a positive number. 

6 . The case s = °° (r - 1 ). Let 2ft« denote the class of essentially bounded 
(y) unbiased estimates of h; and let bestness at 80 be defined with respect to the 
essential absolute suprema of the elements of this class. That is, the unbiased 
estimate fa , of h, is best at 8 0 if 

ess. sup. 10 o (x) | < “, 

ziQ 

and if, when tj> is another unbiased estimate of h, we have 

ess. sup. | <po(x) | £ ess. sup. | 4>{x) |. 
x«n x«o 

The fundamental postulate for the functions is, in this case, that 9b C ?!. 

Now, 8 „ , the space of essentially bounded, measurable (v) functions on Q, 
normed by ess. sup., is the space of linear functionals on 81 [14, p. 338]. Examina¬ 
tion of the proof of Theorem 2 will show that that proof goes through also in the 
present case in all but one detail: we cannot here in general prove the uniqueness 
of the best estimate. The proof of uniqueness breaks down since the equality 

ess. sup. | 0 «(x) + <t>i(x) | = ess. sup. | 4>o(:r) | + ess. sup. | ) | 



486 


E. W. BARANKIN 


does not imply that <t> 1 is a constant multiple of 0 o. Of course, if ipo is fundamental 
in 81 , we have a fortiori the uniqueness of the best estimate. 

The results for the case s = «> are then the following. 

Theorem 3. (i) A necessary and sufficient condition that Win be non-empty is 
that there exist a constant C such that for every set of n functions tts, , , • • • , , 

m ip 0 , and every set of n real numbers ai, Os, • • ■ , a„ , wc have, for every n - 1, 

2 , • ■ ■ , 

£ a, h(6,) ^ 0 j 53 rre, 

i-i I i-i 

(ii) For every <t> e £0?« we have || 0 ||« ^ Co, where C 0 is the g.l.b. of the set of 
admissible constants C above. 

(iii) When 5 Via is non-empty, it contains elements with norm equal to Co, These 
are the best (at Do) unbiased estimates of h. When i)3o is not fundamental in Si, 
there need not exist a unique best estimate. 

We close this section with the remark that Theorem 1 remains valid, as it 
stands, in the case s = °°. 

6 . Particular lower bounds for the minimum s.a.c.m. In order to stress their 
significance in the statistical context, we shall give the statements of this section 
with the help of the symbol <r,( 0 ) for the slh root of the s.a.c.m. of the unbiased 
estimate 0 , of h. We have of course, the relation 

<*•(<!>) = II II* • 

Now, one of the most important aspects of Theorem 2 iB that it presents us 
immediately with an explicit evaluation of the minimum ) for all 4> eWl,. 
We state the formula in the form of a theorem. 

Theorem 4. Let 91 denote the set of all real numbers. Then, 

n 

gl.b. ffj( 0 ) = l.u.b. S a,h(d t ) 

01 ,^ 2 < * ' *!®n f 0 _ 

a l' a 2' « 

2i~i ClyTe^ 
t«l 

For brevity, let us set 

g.l.b. cr.W = <r? in . 

4 . < 2 ) 1 , 

Since this theorem expresses a? 1 " as the l.u.b. of an explicit set of numbers, 
it is clear that the class of all lower bounds of cr“ m is thereby thrown open to us. 
It follows that, when s = r = 2 and our hypotheses on ip are fulfilled, the classical 
lower bounds of Cram 6 r-Rao [3, p. 480] and Bhattacharyya [4, p. 3] are par¬ 
ticularized consequences of Theorem 4. In the results that follow here we shall 
indicate the deduction of those classical bounds. We need not, however, restrict s. 

For a moment, let us denote by r(x) the function on © which assigns the 
value 7 t p lx) to the point p t ©, and let © be an interval on the real axis. Then we 
shall, below, write 7 re for the function (when it exists) on O which assigns the 




■UNBIASED ESTIMATES 


48/ 


value (dtr(x)/dp) r , a .e to x efi. Similarly, v" for the function assigning the value 
(d ir(x)/dp 2 ) p „$ to x ; and so on. 

Theorem 5. Suppose the following conditions fulfilled: 

(i) © = 3, an interval on the real axis; 

(li) h is differentiable on ©' £ 3, 

(iii) for each 6 « ©', its is defined almost everywhere (v), and is an element of £ r ; 

(iv) for each $ e ©', 


lim 

p—»0 


7Tp TT$ I 



= 0. 


Then, for anym + n(m,n= 1, 2, • ■ •) points 0i , 0 2 , - ■ ■ , 9 m in 3, and o[ , 0 2 ' , 
■ ■ ■ , <9,i in ©', and any m + n real numbers a x , a 2 , • • • , a m , b y , b t] • • • , b n suchthat 


53 a > + 53 b, 


Tr'e[ 


^0, 


we have 


( 6 ) 


min 

a. 




53 a, h(6,) + 53 b, h'ifi'f) 

v-1_t-1_ 

m n 

+ 53 

i—l t—1 r 


The prime on the h in (6) denotes the derivative of h. 


To prove this theorem, observe first that by virtue of Theorem 4, we may write 


min 


53 a t h(di ) + 53 


A(p.) — A(0l) 


P, - 


53 Oi’f's, + 53 —- wr- 

t“l l"l p» vt 


for every set of points p x , Pi, • • ■ , p* in 3 such that the denominator of the right- 
hand side is defined and 0. Therefore, also 


(7) 


2; lim 

f n 


i-i i-i Pi — a. 


53 o,iri, + 53 b, ^ 


— ITS? 


p, - 


Now, by condition (iv), the element 


m 

53 




+ 53 &i — 

i=i Pt 



of 8 r converges, in the strong sense in 2 r , to 

m n 

53 tti‘ir«, + 53 fr. > 

1=1 VS-l 



488 


E. W. BARANKIN 


aa pi —► $[ , i = 1,2, • • ■ , n. Consequently we have convergence of the norm; 
that is, the denominator of the right-hand side of (7) converges to the denomi¬ 
nator of (6). (The latter is 9*0, so that for all p< sufficiently close to 6t , i = 
1, 2, • ■ • , n, the ratios in (7) are defined.) There iB no difficulty about the 
convergence of the numerator of (7) to that of (6). The theorem is thus proved. 

Corollary 5-1. Under the hypothesis of Theorem 5, we have, in particular, 
when ft «0' and || »■#, ||, 9* 0, 


( 8 ) 


to Lb 




1 h‘(6o) | 

ten! 1 


If we denote by p the function on ft X 0 which assigns the value p,(x) to the 
point ( x , 0), and write (8) in the form 


(80 


I h'(9o) 1- 

_i r 


(O' S f 

d log p 

r 

Jo 

dS 



Pi„ dp 




the generalization of the Cram6r-Itao inequality afforded by (8) becomes 
evident. 

Using the result and method of Theorem 6, we can establish the next in a 
hierarchy of theorems. 

Theorem 6. Suppose the hypothesis of Theorem 5 satisfied, and the following 
condition fulfilled: for each 6 in a non-empty subset ©" of 0', (i) h"(d) (the second 
derivative) exists and (ii) rs is defined almost everywhere (v), is an element of 8 r , 
and satisfies 


lim 

p-*» 



0. 


Then j for anyi m + n + q(m, n, q 1, 2, • * •) points ft , ft , • • • ,0 m in 3, 
, ft , ■ • ‘ , On in 0', and d" , d't , * • • , d’J in ©", and any m + n + Q real numbers 
i , 1 ' * , Hm, ft i ft, * * • , b n , Ci, Ca, * * * , Cj such that 


we have 



+ ^2 Ciir"t‘ 


* 0, 


min 

o, 


m 

S 

Oih(0i) + X) bih'(6{) + ^2 Cih"{d'i) 


£ ours t + &<*■*< + ^2 cur'll 

r 


JuBt as in the case of the previous theorem, we have here an immediate corollary. 

Corollary 6-1. Under the hypothesis of Theorem 6, we have in particular, when 
ft «©'•©", 


>u 1 bh>{8 a ) 4- ch"(ft) 1 
II birj 0 + ’ 


(9) 



UNBIASED ESTIMATES 


489 


for any two real numbers, b and c, such that the denominator of the right-hand side 
does not vanish. 

Consider. (9) in the particular case s = r = 2. In this case, (9) may be written, 
explicitly, 



In particular, (10) holds for values of b and c which maximize the right-hand 
side. And that maximum value is found, in the usual way, to be 

+ 2J u h'(9 a )h"(d 0 ) + J a [h"m\ 

where the matrix 



is the inverse of the matrix 

Jape\d&/ Ja pa 66 68* 

ridpd'p, r 1/d*p\* 

Jap,de 69* M Jo pi \ 66*) 

Thus, we have 

(11) (a? 1 ") 1 £ J n [h'(6 a )f + 2J“V(*)*"(*) + J n [h"(9 a )]\ 

This is seen to be Bhattacharyya’s result for the case of derivatives up to second 
order. 

It is obvious how we extend Theorem 6 to obtain a similar result involving the 
functions r», vl, , ••• , x«'* > , for any assigned n. And it is thereafter clear 
how, in the case s = r = 2, Bhattacharyya’s general inequality may be deduced. 

It is clear that we can proceed from Theorem 4, under suitable conditions, 
to lower bounds for o? ln which involve integrals of the functions ir{x) (and the 
corresponding integrals of h) as well as the derivatives of these functions. 

In closing this section we note that all the above considerations apply equally 
to the case s = ». 

7. Determination of the best estimate. We shall now prove the following 
theorem, which provides an explicit construction of the best (at So) estimate of h. 
We repeat that s is now taken to be finite. 

Theorem 7. Let 9J1, be non-empty, and be the best (at So) unbiased estimate of h. 
Let i = 1 , 2, •••,&»}, n « 1 , 2, • • • , be a sequence of (finite) sets of points of 
and {a” , i = 1, 2, • • • , &„}, n = 1, 2, • • • , a sequence of sets of real numbers, 
such that 

D 

lim % ~l - = Co = = v“ ln . 

»*•“ xy n „ 

2 ^ <*< v >> 



490 


E. W. BARANKIN 


Then the functions : 




= 


E«r h(e:) i *„ 


E «" 


J | *n jr U /h, \ 

r r • E «? *■»: (*) m (E «? ^#? (*)) 

! 1 .-1 j V'-i / 


(are elements of 8, and) converge strongly in to <fa. 
The strong convergence hero means precisely that 


lim f 1 — <f>v |* dv ~ 0. 

n-*oo * fl 


Clearly, we may, with no loss in generality, assume the numbers a," to be 
such that 


( 12 ) 


k„ 

E <*r 


= 1, n ~ 1, 2, 


We shall suppose this to be the case throughout the proof. Then the essential prop¬ 
erty of the 0," and the a," is that 


(13) 


lim 


E«rfc(W) 


i .-1 


Co. 


And in this normalized situation, the functions {V will be given by 


(14) 


fen 


M = E «:&(*?)• 


i-i 


r/t /k^ \ 

E a" raise) Bgn (E a? tt« 7(m) j . 


That these functions are elements of 8, is easily seen; in fact, 


I i-n ||. = 


E«?W) . 


The proof of this theorem will consist mainly in the application of the following 
two lemmas. 

Lemma 2. Let 0 i\ e £, , and {f n , n = 1,2, ■ ■ •) be a sequence of functions in 
5? r such that 

(i) || || r = 1, n - 1,2, • — 

(ii) lim f £ n i)dv = || y ||,, 

J n 

Then f„ converges strongly in 2, to the function 

1 


£o = 


IhllJ' 


Tfr I V r sgn t?. 


Let us observe first that 



UNBIASED ESTIMATES 


491 


and 


lUoH = 1. 

Furthermore, £o is the unique element with norm ^ 1 in having the property 
(15). For, if also, 

/ = || ||., || (o | |r ^ 1, 

we then have 

f %(£<) + £o)-i) dv = || ij ||,; 

J Q 

and from this, 

2II & + £« Hr I! n II* ^ II v II* • 

That is, 

|| £0 + £0 ||r S 2 II £0 || r + || £0 ||r - 

From this, and (Minkowski) 

|| £0 + £0 || r S || £0 ||r + || £0 ||r . 

we have 

|| £0 + £0 ||r = || £0 ||r + || £0 11 r • 

Therefore, for some a > 0, £0 = o£o - But we must have a = 1 if £0 and £» are 
both to satisfy (15), as assumed. Hence £0 = £ 0 . 

Now consider the sequence |£ n j. Choose a sub-sequence {£„J that converges 
weakly to, say, £'. Then || £' || r ^ 1. We have 

f £'ij dv = lim f £„, rj dv = || ij ||.. 

Jfi i-too Jr 

Hence, £' = £ 0 . And since 1 = || £ ni ||r —* 1 = II £0II* , it follows that £„, converges 
strongly to £ 0 (cf. [13, p. 139, section 3]). 

Suppose there is a subsequence (£„,} of [£„} such that 

|| £». “ £0 II > 6 > 0, f = 1, 2, 

We have, nonetheless, for this subsequence, the hypotheses of our lemma 
satisfied. We can therefore apply the argument of the previous paragraph to 
extract a subsequence of [£l v |, which converges strongly to £ 0 This is in obvious 
contradiction to the above 5-assumption, and the lemma is hereby proved. 

Lemma 3. Lemma 2 remains true with the roles of 8 r and interchanged. 

This is obvious 

Returning now to the proof of Theorem 7, let us first, for the sake of brevity, 



492 


E. W. BA HANKIN 


introduce the notation: 


From 


we easily obtain 


c, - E a? h(0 D, 

7 * •» sgn (e 

'/'n = E a." »>? • 


[ <^Tjdv = A(5), 0 « 0, 

Jo 

/ fotpndv « c B) n * 1, 2, • 
Jn 


which we may write 

/ iryni* dr » | c* |, n«'l, 2, •••. 

■>a 

Since | c„ | —► (| «jfxi ||. (cf, (13)) and || 7 „&. || r « 1, n “ 1, 2, ■ • • , (cf. (12)), we 
have, by Lemma 2, that y n in converges strongly to 

(10) fo = £j7> I <fc> | ,/r sgn (#>o. 

The functions (cf. (14)) 

fn = C n I in | r/ ‘ Sgn 

obviously satisfy 

[ tn'Jnindv = | C„ | , » =» 1, 2, ■ • ■ 

And from this we conclude that 

lim f fniodv «* Co, 


or 


hm f ("TH ^0 dv = 1 = llfollr- 
Jn I C„ I 

We may apply Lemma 3 to this result, since || t„/\ c„ | |j r = 1, n = 1, 2, 
And we thereby conclude that f„/| c„ | converges strongly to 

| io | r/ * sgn i 0 , 



UNBIASED ESTIMATES 


493 


which, on substituting from the definition (16) of 0 O , we find to be just 


00 

<v 


Since | c„ | —> C 0 , it follows immediately that converges strongly to 0 O , and 
the theorem is proved. 

The following corollary is actually of greater use in applications than Theorem 
7 itself, for the reason that it leaves no doubt about the form of lim (i.e , 0o) 
when we know explicitly the form of lim y n 0 n . 

Corollary 7-1. Assume the hypothesis of Theorem 7. Then the functions 



converge strongly, in 8 r , to a function 0o, and 

0o = Co | 0o | r/ * sgn 0o • 


This is clear from the proof of the theorem. 

By way of illustrating the application of these results, we shall prove the 
following theorem 

Theorem 8. Assume the hypothesis of Theorem 5. And, further, let the equality 
sign hold in (8). Then, 

0o(*) = iPHTr- | ir( B (!r)| r, 'sgn ir'o B (x). 

II '"'So Hr 

Since (8) is an equality, we may under the hypothesis of Theorem 5, consider 
that we have 


(17) 


Co = lim 

n-*«> 


1 Mp*) - -——IT h(6 o) 


Pn — 00 


Pn ~ 


Pn — 


A* PH 


1 


Pn *0 


*9 0 


where {p„} is a sequence in 3 converging to 0o. The numerator of the right-hand 
side of (17), sans the vertical bars, converges to h'(8e) (which is 7*0, since 
Co 0); hence, for all sufficiently large n, that expression has the signum of 
h'(6 a ). The functions whose norms appear in the denominator of (17) we know 
to converge strongly in 8 r to irs 0 (by the hypothesis of Theorem 6). Hence, for 
this case, the function 0o of Corollary 7-1 is 


0o = 


Bgn h'jdp) , 

ir^jir T9 "- 


Therefore, by the same corollary, 

I h'(6 0 ) | 


Bgn h'(9 0 ) 


«•/» 



494 


E. W. BARANKIN 


• sgn h'(Oo) • sgn T», (a:) 


JiW 


7TJ 0 (*) | r/ ' sgn 7Tj', (s). 


And this is the result asserted in the theorem. 

The reader will have no difficulty in establishing, in the exact pattern of the 
preceding proof, the following, 

Theorem 9. Assume the hypothesis of Theorem 6. And, further , let the equality 
sign hold in (9) for b = b 0 , c = Co ■ 6 Then, 


<t> o(x) 


boh'(8o) d~ Coh"(8 D ) 
|| b 0 ire a + Cott/j ||J 


6oir8 0 (x)+ Cq-k's\(x) [''• sgn (hoirs 0 (x) + CoireJ (x)). 


It is evident that results of the type in these theorems may be built up as 
well with integrals over the parameter space. 

A question of considerable practical importance is that of the rapidity of 
convergence of the to fa . An answer to this question, on the level of generality 
we are maintaining in this study, consists in relating this convergence to that 
of the | Cn | to Co. In the case s = r — 2, the answer is immediate and exact: 

[ (f» — <£o) 2 dv 
J 0 

[ f„ dv — 2 f <£ofn dv + f 4>l dv 
Jo J si Ja 

| c„ | 2 — 2 | c„ | 3 + Cl 

do-\c n f. 

Thus, if one unbiased estimate is known, it provides, since its norm is SCo, 
an upper bound for 11 £„ — <f> 0 11 2 The same is true in the general case (any s) 
once we have established an upper bound, depending on Co and | c„ |, for 
II fn — 4>o ||>. But in the general case, a good upper bound does not seem to be 
so close at hand. There are indications of the direction in which one must proceed, 
and we hope to draw some significant results out of these before long. 



8. The case s = r - 2. The particular aspects of this case (where bestness 
of an estimate has reference to its variance ), which arise out of the coincidence of 

S r and 8 ,, merit some discussion. We shall denote the inner product, / f y dv, of 

J n 

two functions £ and y in 82 , as usual by (£, y). Let {i|}oj denote the closed linear 
manifold in 82 spanned by the to . 

Theorem 10. Let 2/l 2 be non-empty. Then 4>o is the unique element of SR 2 which 
lies in (To). 

8 In the case s = 2, b a and c 0 are the values which render (11) an equality. 



UNBIASED ESTIMATES 


495 


To begin with it is cleat that the functions of Theorem 7, in the present case 
s = r = 2, are all elements of [To], the linear manifold spanned by the its . 
Hence, f?ince 0 O is the strong limit of these elements, fa 1 { To}. 

Now suppose also 0i e 9Ji 2 , ft e {To]. Then, from 

(0o , ir«) = h(6), e 6 ©, 

(0i, ir 8 ) = h(8), 0«©, 

we have (0i - fa , Te ) = 0, ' 0 e ©, 

and, by continuity of the inner product, 

(& — 0o, £) = 0, f f {To}; 

that is, 0i — 0o e {To) -1 "- But, from 0o e (To) and 0i « (To) it follows that 
0i — 0o e {To) • Hence fa — fa = 0, and this proves the exclusiveness of the 
property for fa . 

Another characterization of fa is given by the following corollary 
Corollary 10-1. If SD ?2 is non-empty, then fa is the unique element of which 
satisfies the system of equations m £ : (0, £) = || £ ||1, 0 e SOI* . 

To see that fa has the asserted property, let 0 be any element of 9D7 2 , and set 
0 = 5+1?. with £ « (To) and i? e {To)" 1 "- From 

(5, ire) = (5 + r?, wj) = (0, 7rj) = h(d), 
it follows that £ «Thi. Hence £ = 0 O . And so, 

(0, fa) ~ (fa + V, fa) — || 00 ||1 
If 0i e has this property also, then both 

(01,0fl) = || 00 ||1 

and 

(0o, 0 l) = || 0i ||1; 

and therefore 

|| 01 ||s = || 00 |U • 

This proves fa = fa , and so the corollary 

9. An example. Let O be Euclidean n-space, x = (xi, x ?, ■ ■ •, a; m, Lebesgue 
measure; ©, the set of real numbers; and 

P,(X) ‘ <2^® XP { _ I S _ 

And finally, let d 0 = 0. Then 

?re ( 2 ) = exp / — ^ x (—2to, + 0 2 )j . 



4 % 


x. w. nxtnsKttf 


If 0 < b < h ancI m! dt '- tlu ' 
we have, for o$ch 9, 


$ t (z) » (1 — 2 fi )"' 5 ptp« b V xl % — j ( 
\ l **S J 


f *,<x)|fc(x) ‘/a “ exp Vi “ J * 

Jfl -* J 

Hus, is (in unbiased estimate of th* functioa hi 

m w 3 /} ~ L 

If vre examine 

II * «! - iffja /, I« - "P i* S *!} “ 1 1" ”P { 4 

we find that this integral converges only for a < I / 2 b. Shifting the emphasis, 
we may atat v.for the Junction h, dcjxnnl fry 

h(6) » f ** 1 - 1 , a > 0 , 

there mafo mi nnbumd estimate m'lh Jh\ik nth moment at 8 «* 0 , /er each 

. « + 2a 


Next, observe that 

II** IP " { ~ 2 *)n« l »«P {-J S ^ ~ ^ + ^ 

*» exp (| nr(r - l)^j, 

go that the« are elements of IV for each r > l. The ratio 

W - - 1) exp f~in(r - l)f\ 

sl ^ Lf 

is Seen to diverge as 8 if 

$n(r « 1) < a. 

Hence, by Theorem 2, there exist* no unbiased estimate of h belonging to Si 
fa* 1 ft value of * *wh that the number 



.tftttgfiea the inequality just above; that is, for a value of s greater than 

n + 2a 



UNBIASED ESTIMATES 


497 


Otherwise stated: there exists no unbiased, estimate of h with finite sth moment at 
0 = 0, for 


s > 


71 + 2a 
2 a 


It is most likely true that this last statement holds, in general, with 

. 7i -f- 2a 


2a ‘ 


We shall consider here only the case 

7i + 2a 
2a 


= 2 ; 


and since the analysis is the same for every pair n, a satisfying this equality r 
we treat the particular case of 

7i = 1, a = }. 

Thus, we shall show: for n = 1, there exists no unbiased estimate of h 2 , 

h(0) = e vt - 1, 

with finite variance at 6 = 0. 


We must show that the ratios 


E - 1) 




diTTOi 


are not bounded for all choices of m (distinct) 0<s, and all sets of^m real numbers 
a{, and all m. This is clearly equivalent to showing the same for the ratios 


<3(7n, ai,6,) = 


£ a.(l - 




Now we find, by direct computation, 

1 

1 7 r$f 


m . a 


2 = £ e-w.-’DVa,. 
! 1 


And the solution of the familiar extremum problem: 


sup 

(0() 


7r» a 

E o.(l - e"*) 


subject to E 9i) ‘a^aj = 1 


supQ 5 (tji, at, di) = E K «(l _ e i9, )(l — 6 i#i )i 

(o() >-I _ l 


yields 



498 

E. W. 15 A RANKIN' 



where the matrix 

v - (v„). 

i,j = 1, 2, • • 

■ . m, 

is the inverse of the matrix 

u - 

i, j “ 1, 2, < • 

■ , m. 

We now take, in particular, 

6i ob it, 

i = 1, 2, • • 

■ > m, 

where t is a positive number. 

Clearly, there exists 

a number U such that for 


t > to, 

U(t) = (e“ u,_fl,,s ) 

is non-singular. Also, 

lira U(l) - I, 
4-^00 


the identity matrix. Then, for t > k, V «= IT' 1 is a continuous function of U, 
so that 

lim 7(0 - (Iim UU))' 1 « 7. 


Hence, 

It follows that 

and therefore, 


lim Vij(t) » 5;;, 


lim sup Q i (m, a<, t<) «* m, 
(«() 


sup Q*(tn, Of, 0<) m. 

(Oi.fi) 


(A simple argument on the characteristic values of U shows that there is actually 
equality here.) This result gives the unboundedness of the ratios Qj and our 
proposition is proved, by virtue of Theorem 2, 


APPENDIX 

The spaces S r and fi, are instances of a Banach space over the reals; that is, a 
complete, normed, linear vector space, closed under multiplication by real 
numbers, That the space, say IS, is normed is to say that there is a non-negative, 
real-valued function, |j ||, defined on 58, with the properties: 

|| £ || = 0 if and only if £ is the null vector, 

II of li - I « I * II f II. 

II ? + v II ^ || £ || + || v II; 

where £, rj 1 58 and a is real. The number || £ || is called tho norm of £. 



UNBIASED ESTIMATES 


499 


The function || £ — y |j on pairs £, i\ of vectors is a distance function in the 
usual sense. With it, strong convergence (or simply convergence ) is defined m 99: £„ 
converges strongly to £ when lim|| £„ — £ || = 0. In symbols: £„ -» £ or lim £„ = £. 

n-H» 

The usual set-theoretic notions are now defined in the obvious way; e.g., limit 
point of a set, closed set, etc That the space 18 is complete means that every 
sequence {£ n ) satisfying lim || £ m — £„ || = 0 converges to a (unique) element 

771 , 71 -* « 

£ € S3. 

A linear manifold, 9)1 in S3 is a subset of SB with the property that for any two 
elements £, y e SK and any two real numbers a, b, we have also «£ + by t Sill. 
A closed linear manifold is a linear manifold that is closed in the set-theoretic 
sense. If S is any subset of 53, then the set, [<S], of all finite linear combinations of 
elements of S is a linear manifold, it is the linear manifold spanned by S. The 
closure of [5], denoted by {8'}, is called the closed linear manifold spanned by S. 
In general, [<S] is a proper subset of {£}. A set S SB is called fundamental 
when (*S) =53. 

A linear functional , G, on 53 is a real-valued function with the property 
that for any two elements £, y « 53 and any two real numbers a, b, we have 
(?(a£ + by) = «(?(£) + bG(y). The linear functional G is said to be bounded when 
the number 


IIG|| 


l.u.b. 
lit ll*o 


|G(£) 
II ell 


is finite. \\ G || is called the norm of G. (Throughout the text of the paper, the 
qualification “bounded” has been understood in all references to linear func¬ 
tionals). If we define the sum of two linear functional F and G by (F + (?) 
(£) — /?(£) -(- (?(£), and make the other requisite definitions in the obvious way, 
we find that the bounded linear functionals on 53 form a linear vector space 
over the reals. The function || || on the bounded linear functionals, which we 
have already called a norm, is m fact a norm m the Banach space sense. This 
vector space, so normed, is readily shown to be complete Hence it is a Banach 
space—usually called the conjugate space to S3. It is this space we have referred 
to m the text as the space of linear functionals on S3. 

If a sequence {£„) of elements of 53 has the property that lim <?(&) = «({) 

for every bounded linear functional G, then £„ is said to converge weakly to £. 
If, of the sequence {£„}, wo know only that lim (?(£„) exists for every bounded 

linear functional, we say simply that the sequence is weakly convergent. The 
space 53 is called weakly complete if every weakly convergent sequence converges 
weakly to a limit. The spaces S r , r 1 are weakly complete 53 is said to be 
weakly compact if every bounded set S <= SB contains a weakly convergent 
sequence. That S is “bounded” means 1-ud). II £ II < ~ 00 ■ 

A real Hil bert space & is a real Banach space on which there is defined an 



500 


E. W. BAUANKIN 


inner product; that is, a function (£, 17) on pairs of elements £, tj, with the 
properties 


(?■ n) (n, f). 

(Of, 1 )) - o{£, 1 j), 

(f + f, q) *» (£, ij) + (fi n), 

II € II* - tt, 0. 

The inner product is a continuous function of both its arguments; i.e., lim £ m = f 
and lim y n «= 17 imply lim (f m , ij n ) = (f, 17 ). The apace 8 * in the text is a Hilbert 

space when we take (f, 77 ) = / fr? dx. Two elements {, y which are such that 

W 0 

(£, 17) » 0 are said to bo orthogonal, If & is any set in $, then the set of elements 
of $ each of which is orthogonal to every element of S is called the orthocomple¬ 
ment of S, and is denoted by >$T. 

For further elaboration the reader is referred to [13] and [19]. 

REFERENCES 

(1) D. Blackweli, and M, A. Girbhick, “A lower bound for the variance of some unbiased 

sequential estimates, 1 ’ Annals of Math. Slat., Vol. 18 (1947), pp. 277-280. 

[2] 11. A. Fisher, "Theory of statistical estimation," Catnb. Phil. Soc. Proc., Vol. 22, 

(1026), pp. 700-726. 

(3) H. CramAr, Mathematical Methods of Statistics, Princeton Prone, Princeton, 1940. 

[4] A. B h attach aryya, "On some analogues of the amount of information and their use 

in statietioal estimation," Sankhyd, Vol. 8 (1948), pp. 1-14. 

[61 M. A. Girbhick, F. Mobtbller, and L. J. Savage, "Unbiased estimates for certain 
binomial sampling problems," Annals of Math. Slat., Vol. 17 (1946), pp. 13-23. 
[0] J, Neuman, “Su uu toorema conoaraente le cosidette statiatiohe sufRcionti," Oiornale 
dell'IstitiUo Italiano degli Attuari, Vol. 6 (1936), pp. 320-334. 

[7] D. Blackwell, "Conditional expectation and unbiased sequential estimation," 

Annals of Math. Stat., Vol. 18 (1947), pp. 106-110. 

[8] H, CramAr, "A contribution to the theory of statistical estimation,” Skand. Akluar. 

lids., Vol. 29 (1940), pp. 86-94. 

[9] A. BhaTtacharyya, "On some analogues of the amount of information and their use 

in statistical estimation (oont’d)," Sankhyd, Vol. 8 (1947), pp. 201-218. 

[10] J. WoLFOwm, VThe efficiency Of sequential estimates and Wald's equation for se¬ 

quential processes,” Annals of Math. Slat., Vol. 18 (1947), pp. 216-290. 

[11] F. Russ, Les Sysllmes d’Pquations Linlairee a t me Ivfiniti d’lnconnuca, Gauthler- 

Villars, Paris, 1013. 

[12[ F. Riesz, "Unterauohungen Cher Systeme integrierbare Funktionen," Math. Annalen, 
Vol. 89 (1910), pp. 449-497. 

[13] S. Banach, Thhorie des Opiraiions Lintaires, Garasinski, Warsaw, 1932. 

[14] N. Dunford, "Uniformity in linear spaces," Am. Math. Soc. Trans., Vol. 44 (1938), 

pp. 806-356. 

[16] M. H, Steinhaus, "Additive und stetige funktionaloperationen.u Math. Zeits., Vol. 
6 (1918), pp. 186-221. 



UNBIASED ESTIMATES 


501 


[16] F. Riesz, “Sur une espdce de g6om6trie analytique des syst&mes de fonctions som- 

mables,” Comptes Rendua, Vol, 144 (1907), pp. 1409-1411. 

[17] M. Fb£chet, “Sur les ensembles de fonctions et les operations lineaires,” Comptes 

■Rendua, Vol 144 (1907), pp. 1414^-1416. 

[18] S. Saks, Theory of the Integral, Steehert, New York, 1937 

[19] J. von Neumann, Functional Operators, (Mimeographed notes) Princeton, 1935 

[20] G. It. Seth, “On the variance of estimates,” Annals of Math. Stat , Vol. 20 (1949), 

pp. 1-27. 

[21] B. J Pettis, "A note on regular Banach spaoes,” Am. Math. iSoc Bull., Vol 44 (1938), 

pp. 420-428. 



A SEQUENTIAL DECISION PROCEDURE FOR CHOOSING ONE OF 
THREE HYPOTHESES CONCERNING THE UNKNOWN 
MEAN OF A NORMAL DISTRIBUTION 

By Milton Sobel and Abraham Wald 1 
Columbia University 

1. Introduction. In this paper a multi-decision, problem is investigated from 
a sequential viewpoint and compared with the best non-sequential procedure 
available. Multi-decision problems occur often in practice but methods to deal 
with such problems are not yet sufficiently developed. 

The problem under consideration here is a 3-decision problem: Given a chance 
variable which is normally distributed with known variance a 2 , but unknown 
mean 0, and given two real numbers ax < a», the problem is to choose one of the 
three mutually exclusive and exhaustive hypotheses 

Ex'. 9 < ai Ih ■ ax g 9 at Ih : 9 > 02 . 

In order to select a proper sequential decision procedure, the parameter space 
is subdivided into 5 mutually exclusive and exhaustive zones in the following 
manner. Around aj there exists an interval (6x, 9i) in which we have no strong 
preference between E L and 7/2 but prefer (strongly) to reject Ih . Around a 2 
there exists an interval (0 3 , 00 in which wc have no strong preference between 
lit or 7/j but prefer (strongly) to reject Hi . For 0 g Ox wo prefer to accept H \. 
For 0 2 S 9 ^ di we prefer to accept 112 . For 6 & 0 { wo prefer to accept II 3 ■ 

The intervals {Ox , 0 a ) and (0 3 , 0<) will be called indifference zones. The de¬ 
termination of these indifference zones is not a statistical problem but should 
be made on practical considerations concerning the consequences of a wrong 
decision. 

In accordance with tho above wc define a wrong decision in the following 
way. For 0 ^ 0i, acceptance of Hi or Ih is wrong. For 0i < 0 < 02 acceptance of 
H 3 is wrong. For 02 § 0 0 3 , acceptance of Hi or Ih is wrong. For 0 3 < 6 < 0<, 

acceptance of Hi is wrong. For 6 ^ 9 t , acceptance of Ih or Ih is wrong. 

The requirements on our decision procedure necessary to limit the probability 
of a wrong decision are investigated. Two cases are considered. 

Case 1; Prob. of a wrong decision g 7 for all 9. 

Prob. of a wrong decision ^ 71 for 0 ^ 9x, 

Case 2: <Prob. of a wrong decision ^ 72 for 0 t < 0 < 0 t, 

Prob. of a wrong decision ^ 73 for 6 § 0*. 

The decision procedure discussed in the present paper is not an optimum 
procedure since, as will be seen later, the final decision at the termination of 

1 Work done under the sponsorship of the Office of Naval Research. 

602 



A SEQUENTIAL DECISION PROCEDURE 


503 


experimentation is not in every case a function of only “the sample mean of all 
the observations”, although the sample mean is a sufficient statistic for 9. Al¬ 
though the procedure considered is not optimal it is suggested for the following 
reasons: 

1. The decision procedure can he carried out simply In fact tables can be con¬ 
structed before experimentation starts that render the procedure completely 
mechanical. 

2. The derivation of the operating characteristic (OC) function, neglecting the 
excess of the cumulative sum over the boundary, is accomplished with little 
difficulty. In general, for other multi-decision problems it is unknown how to 
obtain the OC function. 

3. It is believed that the loss of efficiency is not serious, i,e, the suggested 
sequential procedure is not far from being optimum. In this connection a non¬ 
sequential procedure is compared with this sequential procedure. The results 
show that, for the same maximum probability of making a wrong decision, the 
sequential procedure requires on the average substantially fewer observations to 
reach a final decision. In fact, for Case 1 noted above, if 008 < y < .1, and if 
certain symmetrical features are assumed, then the fixed number of observations 
required by the non-sequential method is greater than the maximum of the 
average sample number (ASN) function taken over all values of 9. 

It was found necessary in the course of the investigation to put an upper bound 
0 _ 0 ^ 

on the quantity —-— in order that the methods used to obtain upper and lower 

ai — ai 

hounds for the ASN function should give close results. This restriction, however, 
is likely to be satisfied m practical applications 

All formulas for ASN and OC functions which will be used in this paper will be 
approximation formulas neglecting the excess of the cumulative sum over the 
boundaries Nevertheless, equality signs will be used in these formulas, except 
when additional approximations are involved. 

2. Description of the Decision Procedure. 2 We shall assume that the indiffer¬ 
ence zones described above have the following properties 

(i) 01 <1 0,1 <s 02 S 03 *C U2 <C 04 

(ii) 0i 4- 02 = 2ai ; 03 + 04 = 2 o2 

(iii) 0 2 - 0i = 04 - 03 = A (say). 

2 A similar decision procedure was used by P. Armitage [2] as an alternative to the 
sequential t test (with 2-sided alternatives) The form used there is more restricted _aa he 
considers only the case 0 2 = 0a • Essential inequalities on the OC function are p’ointed out 
but no attempt is made to determine the complete OC and ASN functions. A closely related 
but somewhat different procedure for dealing with a trichotomy was suggested by Milton 
Friedman while he was a member of the Statistical Research Group of Columbia University 
As far as the authors are aware, no results were obtained concerning the OC and ASN func¬ 
tions of Friedman’s procedure. 



504 


MILTON SOBEL AND ABRAHAM WALD 


Let Ri denote the Sequential Probability Ratio Test for testing the hypothesis 
that 0 = 0i against the hypothesis that 8 - 9 r . We assume for the present that 
either the proper constants A, B in the probability ratio test are given or that 
they are approximated from given a, fi by the relations 

. 1-0 » -• P 


B 


1 - 


Here a and /3 are upper bounds on the probabilities of first and second types of 
errors, respectively, 

Let jRi represent the S.P.R.T. for testing the hypothesis that 9 « 0 a against the 
alternative that 0 = 0*. For this test we assume that (a, 0, A, B) are replaced 
by (a, (§, A, B) and as above that either A and B are given or that they are 
approximated from given a, 

The decision procedure is carried out as follows: 

Both Ri and Ri are computed at each stage of the inspection until 

Either: One ratio loads to a decision to atop before the other. Then the former 
is no longer computed and the latter is continued until it leadB to a decision to 
stop. 

Or: Both Ri and R% lead to a decision to stop at the same stage. In this event 
both computations are discontinued. 

The following table gives the rule R for the decisions to be made corresponding 
to all possible outcomes of Ei and R% . 


Ri 


Rt 


R 


If 

accepts Oi 

and 

accepts 0j 

then 

accoptB Hi 

If 

accepts 0j 

and 

accepts 0j 

then 

accepts Hi 

If 

accepts 0i 

and 

accepts 04 

then 

accepts H% 


We shall show that acceptance of both 9i and 0 4 is impossible when ( A, B) = 
(A, B ). For this purpose we need the acceptance number and rejection number 
formulas. (Seo page 119 of [1]). 

Acceptance Number Rejection Number 

2 n 3 

Ex: ~ log B + Oi n < T) x„ < ~ log A + a t n 
A «-i A 

a 1 n a 

Ri : - log B + Ojtt < £ £» < r log A + <h,n. 
a A 

We shall assume jor convenience that “between observations” Ri is tested before 
ftj and let the term “initial decision” refer to the first decision made. 

Assume 0! and 6* are both accepted. Then if 0i is accepted initially at the mth 
stage 




A. SEQUENTIAL DECISION PROCEDURE 


505 


Since 

<r 2 c 2 

- log B + oi m < - log B + 02 m 

it follows that 6 4 is rejected at the same stage, contradicting the hypothesis 
Similarly if d 4 is accepted initially at the mth stage, then 

»n 2 

53 ^ r log A + «2 m. 

a-l A 

Since 

2 2 

- log A + 02 m > — log A + aim 

it follows that 9i is rejected at the same or at an earlier stage, contradicting the 
assumption that the acceptance of is an initial decision. Hence 61 and d 4 cannot 
both be accepted. 

A geometrical representation of the rule R is given in Figure 1. 

R can now be described as follows: Continue taking observations until an 
acceptance region (shaded area) is reached or both dashed lines are crossed. In 
the former case, stop and accept as shown above. In the latter case stop and 
accept Hi. 

The proof above that 6 i and 0 4 cannot both be accepted consists of noting that 
a point below the acceptance line for 61 is already below the rejection line for 
9 t and that a point above the acceptance line for 64 is already above the rejection 
line for 6 i. 

If (A, 6) * (A, B ), a necessary and sufficient condition for the impossibility 
of accepting 61 and 0 4 is that at n = 1 the following inequalities should hold. 

Rejection Number (of 0i) for R x ^ Rejection Number (of 0 3 ) for Zf 2 

and 

Acceptance Number (of 0t) for R x ^ Acceptance Number (of 0 3 ) for R t . 

In symbols 

2 <r 2 

log A + di A - log A -f 02 
and 

2 2 

c - log B -j- ai ^ a - log B + 02 . 

These can be written as 

i g e d ^ and | £ c d4/ ‘ 

A -D 


respectively, where d — <h — a x . 



50G 


MILTON SOBEL AND ABRAHAM WALD 


Since > 0, the above inequalities are certainly fulfilled when 

O ' 4 

, , BA 

(2.1) j, g 1 and jj $ 1. 

In what follow in this paper, we shall restrict ourselves to cases where accept¬ 
ance of both 0i and fi 4 is impossible, even if this is not stated explicitly. 



Piqued 1 

3. Derivation of OC Functions. Let L(Ht | 0, Ii) denote the probability of 
accepting H t when 0 is the true mean and R is the sequential rule used. Let II ^ 
denote the hypothesis that 0 = 0,. Since, as shown above, Hi is accepted if and 
only if 0i is accepted, we have 

(3.1) L(ffi | 0, R) = L(ff0, | 0, Ri). 

Similarly, 

(3.2) 


L(Hi | 0, R) - L(H U | 0, R,). 



A sequential decision procedure 


507 


From the fact that Rl and R 2 each terminate at some finite stage with prob¬ 
ability one, it follows that R will terminate at some finite stage with probability 
one. Hence 


(3.3) L(H 2 1 6 , R) = 1 - LCffi [ 6 , R) - L(H, [ 9, R ). 
From pp. 50-52 of [1], the following equations are obtained. 

(3.4) L(H, | 0, R) = L(H 6l 1 9, R t ) = 
where 


.and 
(3 5) 
where 


To = h\( 6 ) = 


62 61 — 26 

62 — 9i 



L(H H \e, R*) 


A h 2 - 1 

A hl - B ht 


hi — h 2 { 0 ) — 


64 63 — 2 6 

64 — 63 


a 2 — e 


A 

2 


These equations involve an approximation, as explained m [ 1 ] 
Hence 

(3.6) L{H a I 6 ,R) = L(H et \ 6 , R*) = 1 - L{H h | 6 , R 2 ) - 


and 

(3.7) L(H 2 1 9, R) = 1 


A hl - 1 1 _ B h * __ 1 - B* 1 1 - j3* 2 

A* 11 _ fihi ~ A>n _ 5/11 JA2 _ _gh2' 


Since L(.ffi | 0 , E) = L(i7 ei | 6 , i?i), it follows that L(Hi | 6 , R) is a mono- 
tonically decreasing function of 6 and that 

L(Hi | — co, fl) = 1, L(H 1 |» J B) = o 

L(H 1 | ft, J2) - 1 - L(Hi 1 62 , R) = IS 


L(Hi | ai, fl) 


log ^ 

log A + I log 5 [ ■ 


Similarly, since L{H a \ 0 , R) = 1 — L(He s | 9, R 2 ), it follows that L(H» | 6 , R) 
is a monotonically increasing function of 6 and that 


L(Hi | — oo, ft) = 0; L(H 3 \ co, R) = l 



608 


MILTON SOBEL AND ABRAHAM WALD 


Since L(H a \ Q, R) = 1 — L{Ih | 6, R) — Ulh j 0 , 22) it follows easily from the 
above results that 


L(Ih | - =o, R) = 0; Ulh i «, R) - 0 
L(Hi\6,R)<a for 0<B l ; UH,\0, R)< & for 6 > Oi 

loiTTfi^i “ “ < Ulh 1 ’ R) < iogA^|lJi7i] 
iogi + flog” p < L[Ihl(h,R] K I^T+Wll 

1 - (3 - a < L(Hi j Q, R) < 1 for 0 % £ 8 £ Qt. 


4. Probability of Correct Decision. Denote the probability of a correct 
decision by L(0/22). It is defined as follows: 


Interval 
6 g 6i 

8i <£. 8 < 

6i Sa 8 Sa 03 
0 a < 0 < 0 4 
0* g 0 


Correct Decisions 
acceptance of Ih 

acceptance of Hi or Hi 
acceptance of Hi 
acceptance of Hi or Hi 
acceptance of Hi 


L{B\R ) 

L(Hi I 6, R) 

Ulh j 0 , 22) +• L[Hi | d, R ) 
L(Ih | 6 , 22 ) 

7,(2/ a | 0 , It) + UHi | 0, R) 

Ulh | 0 , 22 ) 


It should be noted that at points of discontinuity, 7/(0, | 22) is defined as the 
smaller of the two limiting values. 

We shall now discuss some monotonicity properties of the function L{0 | 22). 
From the fact that L(27 Sl | 0 , Ri) and L(H f , ) 0 , 22 a ) are continuous with con¬ 
tinuous first and second derivatives and are monotonically decreasing for all 
0 with a single point of inflection in the intervals 0 i < 0 < 0 2 and 0 3 < 0 < 0 , 
respectively, it follows that 

(i) L(9 1 22) is monotonically decreasing with negative curvature for — «> < 
0 g 0 i. 

(ii) L (01 22 ) is monotonically increasing with negative curvature for 0 4 g 
0 < «>. 

Making use of (3.3) we have further 

(iii) L (0 | 22 ) is monotonically decreasing with negative curvature for 0 j < 

0 < 0a. 

(iv) L (01 22 ) is monotonically increasing with negative curvature for 0 3 < 
0 < 0 *. 

(v) For 8i £ 0 £ 0 3 , ~ L(01 22) = -[| L(72x j 0, 22) + | L(Hi | 0, 22)] is 

d d 

decreasing, since — L[H i | 0 , 22) and ~ L(27 s | 0 , 22) are increasing. In 
other words L (0 | 22 ) has negative curvature for ^ ^ £ Jj. 



A SEQUENTIAL DECISION PROCEDURE 


509 


In the special case when A = A = ^ j and the origin is taken at 

for the sake of convenience, it is easy to see that L(& | R) is symmetric with 
respect to the origin and, because of (v), has a local maximum at 6 = 0 . 

6. Choice of the constants A, B, A, B to insure prescribed Lower Bounds 
for L(0 | R). We shall deal here with the question of choosing A, B, A and B 
such that L (0 | R) ^ 1 - 71 when 9 g 0 4 , L(6 | R) ^ 1 - 72 when 0 i < 0 < 0„ , 
and L(6 | R) 1 — 73 when 9 ^ 0 4 . From the monotonic properties of the correct 
decision function it is only necessary to insure that 

(5.1) L(d 1 1 R) = 1 — 71 , L(0 2 1 R) = L(6, | R) = I -72 and L( 0 4 1 R) = 1 — 73 . 
The following relations will be needed: 

hi( 0 i) = ^2(^3) = 1 — —hi(# 2 ) = —^2(64) 


hiidi) 


03 "b 9 i — 262 _ 2 

A A 

2 


r (say) 


hi ( 63 ) 


9\ 02 — 20 3 

A 



—r 


where d = 6 t — 62 = 0a — 0i = 02 — ai. 

The following four equations are obtained from (5.1): 

(5.2) 1 - L{H: \6 l ,R) = L{H )2 1 9,, £,) = = 71 

1 - L{H 2 1 R) = L(Hi 102, R) + L(ff 3 1 ft, R) 

(5 - 3) - ~ D +fJ—jll - 7I 

4 - B U r ~ B r J 

1 - L(H, 103, R) = L(tf 3 I ft, -R) + W I ft, R) 

(5.4) 1 - £ , fB r U r - 1)1 

= T^B + l-AT^Br\ = 

(5.6) 1 - L(H S \ 6 i, R) = | 04, R 2 ) = = 7a- 

The “bracketed terms” represent quantities less than a and 6 respectively and 
if r is sufficiently large they can be neglected. This will be made more precise 
but first let us note the results of neglecting the bracketed terms. 

From (5,2) and (5.3) we obtain 


(5 6) 



B(l — 71) = 72, whence B = 



510 


MILTON SOBEL AND ABRAHAM WALD 


From (5.2) and (5.6) 
(5.7) 


A = ---— whence A = --—. 


■ 7i 


7i 


Since the last two equations are obtained from the first two by the permuta¬ 
tion A —► A, B —> B, 7 i —> 73 , 72 -+ 7 a, we have 


B = 


78 


1 — 7a 


A = - 


1-73 


72 


1 1 1-7 

If 71 = 72 = 73 = 7 (say) then A = A = ^ = g = —~ . 

We shall consider the bracketed quantities negligible if the result of neglecting 
them produces a change of less than 20 % in [1 — L(0 | 22 )] at 0 — Oi, 0 3 re¬ 
spectively, i.e., if 


(5 8 ) 

and 

(5.9) 


i - n r 


i 


- (r*j 


73 

5 


L r - T} n - 7 »y / 73 V - 5 

\ 7a / \1 ~ 7s/ 


(4-y - w 


A r - B 


72 

5 


Inequality, (5.9) can be written as 

7a[(l - 7a) r — 7iJ 


(1 — 72) r (l — 7i) r — 7i 




72 


or 


(1 - 72) r 

This will certainly hold if 

or if 


7 r 2 - ? (1 - 7i) r 


^ ( 7172 ) 


'0-¥> 


75 s?(l - 7i) r 


( 'i' 8 Y < 

VI - 71 / - 5 • 

Assume that 71 ,72 and 73 are each less than l. Then the last inequality can be 



A SEQUENTIAL DECISION PROCEDURE 


511 


written as 


(5.10) 



Starting with (5 8 ) the same relation is obtained except that 71 is replaced by 
73 , namely 


(5.11) 


r g 


log — 
72 


log 


1 — 73 
72 


Let 


k = 


1 5 

lo S- 



where 7 is the larger of 71 and 73 . Then k is the larger of the right hand members 
of (5.10) and (5 11). Then for (5 8) and (5.9) to hold it is sufficient that 

r g k. 


2 

If 72 = .05 and 0 < 71 , 7 a < -1 then k is approximately = 1 54. If 72 = 01 

2 7 

and 0 < 71 ,73 < -1 then k is approximately ■— = 1.35. 

We shall now investigate under what conditions the approximate solution 
obtained above for A, B, A, B are such that acceptance of both 81 and 84 is 
impossible. 

It follows from (2 1) that the following pair of inequalities are sufficient for 
the impossibility of accepting both 81 and 84 : 


(5.12) 


A 

A 


72 1 — T2 


^ 1 ; 


B 

B 


72 1 — 72 


S 1. 


71 1 — 73 ’ B 7a 1 — 7i 

If 7 i T * 5 7S let the smaller and larger of the pair ( 7 :, 73 ) be denoted by 7 and 
7 respectively. Since 1 — 2 > 1 — 7 j then 

7s(l — 72) 72(1 ~ 72) 

7(1 - 7) < 2(1 - 7) 


and we need only consider one of the two inequalities in (5.12). The condition 
7 2 < 7 will in general satisfy (5.12). More precisely if all the 7 'B are restricted 
to the interval (0, .1) then 

9 1 — 72 1 — 72 10 

— < - — < - < — 

10 - 1 - 7 1 — y 9 


and it is sufficient for the validity of (5.12) that 72 ^ (.9) 2- 



512 


MILTON SOBEL AND ABRAHAM WALD 


If Yi = 73 = 7 (say) then the two inequalities reduce to one 

72 - 7 j + 7 — y k 0 

which can be written as 


(72 ~ 7) (72 — 1 + 7) S 0. 


Since the inequality 73 g 1 — 7 is impossible when all y’s are <§, wc see that 
7 s < 7 is sufficient for the validity of (5.12) when 71 = 72 = 7 < i- 
There remains the problem of finding an approximate solution for equations 
(5.2) to (5.5) when r < k. Since 

d — j 0s — 0a 4- 2 


2 2 


we merely have to consider the interval 1 £ r < k. 
The following approximations are used 


(5.13) 


1 - B 1 

—n- ..... /_S “ / ^ ) 

1 - ft 1 

A ~ft~A’ 


}„ B . 

A ~B 

„ r ; 

A' - B r 


1 - ft r _ 1_ 
A r - ft r A* 


ft{A -J) 
A - ft 




ft, 


which upon substitution yield 

(5.14) 

(5.15) 

(516) 

(5.17) 


A 

ft 


Yi 

73 


B +X r= Tj 

£ + B T = 7j. 


Subtraction of (5.17) from (5.16) shows that B = is a solution. Substituting 

A 

this result back in (5.16) leads to the equation 
(6.18) B -f BT-m 73 . 

It can easily be verified that between zero and unity this equation has exactly 
one root. Since 1 ^ r < <x>, the root of the above equation lies between ~ and 


72 • 

Taking 72 as a first approximation for B and substituting y 2 + « for B in 
(5.18), We obtain 


«+ (72 + (y - 0. 



a sequential decision proceduee 


513 


Expanding (72 4- *)* in a power series in t and neglecting second and higher 
order terms, the above equation gives 


Thus, 


72 


1 + ryT 1 ■ 


(5.19) 



72 


1 + nr 1 


72 [1 + (r — 1)72 *] 

1 + nr 1 


It is necessary to investigate under what conditions the above approximate 
solution satisfies (5.2) to (5.5) to within a 20% error in [1 — L(8/R)], i.e., such that 


(5.20) 

1 

Cn|z? 

A 

(5.21) 

-!< 

(5.22) 

1 

A 

(5.23) 

1 

A 


7i(l 

- B) 

1 - 

yiB 

73(1 

- B ) 

1 - 

73 B 

B( 1 

— 7i) 

1 - 

y\B 

B( 1 

— 73 ) 

1 - 

73 B 


. 7l 

-71 <5 


, 7a 
- 7s < -r 
0 

B r ( 1 - yj) 72 

+ T-irsBf - 72 < 5 

, B r ( 1 - 7 D „ 72 
+ r- (71 b/ - 72 < 5 


where for B the value in (5.19) is understood. 

It can be shown that if 71 ,7s, 73 , are each between zero and .1 then the 
inequalities (5.20) to (5.23) hold. Furthermore it can be shown that if, in addition 
72 :£ min (71 , 73 ) then also the inequalities (2.1) hold. The latter inequalities are 
sufficient to ensure the impossibility of accepting both 0i and 8 i . 


6. Bounds for the ASN Function. First we shall derive lower bounds for 
the ASN function. Let E(n/ 8 , R) denote the expected value of n when 0 is the 
true mean and R is the sequential rule employed. For 8 < 9 t the probability of 
coming to a decision first with ft is large and therefore 

E{n/ 8 , R) ~ E{n/ 6 , ft) d < 82 . 

From the definition of R it follows that 

E(n/6, R) > E{n/9, ft) for nil e: 

Hence E(n/8, ft) serves as a close lower bound when 5 < 0 2 . 

Similarly 

E(n/6, R ) ~ E(n/8, ft) for 8 > 

E(n/8, R) > E{n/9, ft) for all 8 

Hence E(n/d, ft) serves as a close lower bound for 8 > 83 , 

Combining the above we have 

(6.1) E(n/6, R) > Max [E(n/6, ft) , E{n/8, ft)| 



514 


MILTON SOBEL AND ABRAHAM WALD 


where, neglecting the excess over the boundary, 

(G 2) E(n/B Ex) = Rl ^ log B + Ip g A 


~ (8 - a,) 


(0 3) E{n/6 jRn) = -^ 2 ) log ^ I j (,Ho i /0 ) /is) log A * 


- (0 - Ch) 
<r 2 


Formula (6.1) gives a valid lower bound over the whole range of 0 , but this 
lower bound will not be very close in the interval (0 2 , 8 a), particularly in the 

0 -4- $ 

neighbourhood of the mid-point The authors were not able to find any 

u 


simple method for obtaining a closer lower bound in this interval. The upper 
bound given later in this section will, however, be fairly close also m the interval 
(£h , 0a) and can be used as an approximation to the ex-act value. 

We shall now derive upper bounds for the ASN function. Let II* be the follow¬ 
ing rule: “Continue to take observations until Ri accepts 0i.” Since this implies 
the rejection of 0.j at the same or at a previous stage, it follows that R must 
terminate not later than R* . Hence 


(6-4) 


E(n/0, Ri) Sr E(n/6, R). 


As a matter of fact one can easily verify that E(n/0, R*) > E(n/0, R). Thus 
E(n/8, R*) is an upper bound for E{n/$, R). This upper bound will be close 
when the probability of accepting 0, is high, i.e., for 9 5 0,. 

By the general formula 


E(n) = 



(see p. 53 [1]) we obtain, upon neglecting the excess over the boundary, 


(6.5) 


E(n/6, Ri) 


log B 

~ (0 - a,) ’ 


This coincides with (G 2) when l'h) - 0. 

Similarly, if Rt denotes the rule of continuing until Rj accepts 9 t , then 

(6.6) E(n/8, Rt) > E(n/8, R) 

(6.7) E(n/0, Rt) - ■ -l og ^ _- 

-j (0 - aj) 


and this will be a close upper bound for 0 ^ 0 4 . 

If A = A. = ~ = g and if oi + a 2 = 0 the above results reduce to 



A SEQUENTIAL DECISION PROCEDURE 


515 


(6.8) E(n/d, R) ^ E(n/e, flf) = f or 6 S 

A + 0 

(6.9) E(n/d, R) ^ P(n/0, fi?) = -A- for 9 S 0 4 

a — A. 

where the symbol ~ stands for a close inequality, and where 
a 2 

ft = - log A and X = a 2 = — a t . 


To establish an upper bound for the ASN function m the interval 8 t < d < 0, 

we shall restrict ourself to the case where A = A = ~ \ . These relations are 

n a 

fulfilled by the approximate values of A, B, A, B suggested in section 5 when 
7 r = 72 = 73 and r S k, We shall choose the origin to be at ( Le-) we p U t 

fl -—~ 2 = 0. Then the vertex P of the triangle (Pj, P 2 , P) m diagram 1 lies on 


jL 

the abscissa axis and OPi = OP 2 = h The abscissa of the vertex P is — = N (say) 

X 


tr 

where X = a 2 = — a t Let y = Yl X x represent the sum of the first N observa- 

tions Let P 23 denote the rule: “Continue until both d 2 and 0 3 are accepted”. 
This is tantamount to neglecting the two outer lines m diagram 1, i.e , the accept¬ 
ance lines for 0 X and 0. t. Then clearly, 


(6 10 ) 


E(n/6, Rn) > E(n/0, R). 


When 6 lies between 0 2 and 0 3 this inequality will be close, since the probability of 
crossing either of the two outer lines is then small 
However E{n/9, R 23 ) was found difficult to compute and it was necessary to 

N 

consider instead the rule P 23 : "Take N observations. If y = 22 X, < 0 then 
continue until 0 2 is accepted. If y > 0 then continue until 0 3 is accepted ”. 3 Clearly, 

(6.11) E(n/0, P 23 ) > E(n/6, Pss). 

This inequality, however, will be close only if the probability of concluding the 
test before N observations, given that 82 < 6 < d 3 , A small. 

Some investigations by the authors seem to indicate that the inequality (611) 
will be close when A < X. This inequality is likely to be fulfilled in practical 
problems 

We shall now proceed to determine the value of E(n/8, P 23 ). Neglecting the 
excess over the boundary, we have 

(6.12) E (n/8, R' w , 22 = yj = ^ for V > 0 


3 The event y — 0 has probability zero and it is indifferent what rule is adopted for that 


case. 



5X6 


MILTON SOB EL AND ABRAHAM WALD 


and 

(6.13) E (n/9, Rv, £ Xi = j/J « ~ - for y < 0 

where, for any condition C, E(n/d, R, C) denotes the conditional expected value 
of n given that the true mean is 8, that R is the sequential rule used and that the 
condition C is fulfilled. 

Multiplying with the density of y and then integrating with respect to y, we 
obtain after simplification 

(6.14) Ei.n/6, R' m ) = ^- gl [* + 2m (i + 2cr <f 

g-C| k»/2) 

where <j>(x) — J dy, and 8i < 8 < da . 

In particular, .for 6 = 0 we get 

(6.15) E(n/d « 0, R'u) « ^ 4- \/^T ’ 


To establish a close upper bound for 0 a < 0 < 0 4 we mUBt bring the line of 
acceptance of 8< into account. The line of acceptance of 0i can be disregarded 
since the probability of accepting 8i is very small. 

We therefore define the rule Ru as follows: 

“Continue with Ri until 0 2 is accepted and with Rt until either 9 3 or 8 t is 
accepted.” 

Since the ASN function for R M is difficult to compute we define a modified 
rule Ru as follows: 


“Proceed to take N 

K 



observations without regard to any rule. If y = 


2 Xi < 0 then continue only with Ri until d 2 is accepted. If 0 < y < 2h then 

t-1 


continue only with Il 2 until either 8, or 8 a is accepted. If y 2h then stop taking 
observations and accept R% 

It is dear that the following inequalities hold 


(6.16) E{n/8, Su> > E(n/d, Ru) > E(n/9, R). 


The proximity of E(n/B , R u ) and E{n/8, R), as stated above, is based on the 
fact that the probability of accepting 0i, when 9 t < 8 < ft , is small. 

The proximity of E(n/9, Ru) and E(n/9, Ru) is assured if the probability 
of terminating with Ru (and with R) before N observations is small. It can be 
shown that the latter condition is fulfilled when A < X. In terms of the quan¬ 
tity r defined in Section 5 this can be written as r > 3. 

To determine the value of E(n/9, Ru) the following two preliminary results 
will be needed: 



A SEQUENTIAL DECISION PROCEDURE 


517 


If 0 < y < 2 ft, 


"l _ -(2/<r a )O*-0H2ft-l/>n 

/ N \ , 2ft — y — 2ft —---- 

(6.17) E (n/e, R'u , £ x. - y ) = * +_ J 

\ '-i / a e — x 

= (7 (say). 

If y < 0, 

(6.18) (n/0, -4- = D (say). 

\ »—l / A A T 9 

Both are easily obtained from formula (7.25) on p. 123 of [1], 

Multiplying with the density of y and integrating with respect to y, we obtain 
after simplification 


E(n/8, R' 3i ) = £ + 




(d 2e' 


—(2A(X—9)/ir*J 


(6.19) 


(x - <?)( x 


+ 


x(x - e) y 2 tt 

m 

2X(X + 0) 


ft\ J g -(X82/2X»2> _ e -X(2X-8) a ;2\» a j 


.i-^CvDl+xim) /I 


-(Ji8*/2X»2) 


Formula (6.19) is an improvement on (6.14) as it will give for any 9 a smaller 
upper bound, but in the neighborhood of the origin the difference is insignificant. 
For 9 = X we obtain from (6.19) using L’Hopital’s rule 


( 6 . 20 ) 


E(n/\, R'u) = h l -±- (4ftX - 3<r 2 ) 

<T* 4A.CT 




-(XX/J» 2 ) 


If 


Vftx 


> 2.5, the above formula can be approximated by 


( 6 . 21 ) 


h i 2 <r , /ft A (ftx/ 2 <r*) 


E(n/\, Ru) ~ n -+±Z a/!£ e 


n ft 2 

Since the right hand member above lies between - and (1.002) — when 
frr a ° 

-- > 2.5 then for practical purposes 


( 6 . 22 ) 


E(n/X, Z&) ~ \ 

G* 


when ( W ^ . 5 


. 5 ). 



518 


MILTON SOBEL AND ABRAHAM WALD 


An upper bound for E{n/d, R) for 6 i < 0 < 6 t can be obtained by defining 
Rn and Rn in an analagous way to Ru and Rat Because of reasons of symmetry, 
E(n/6, Rit) can be obtained from (6.19) by replacing 0 by —6. 

The method used for obtaining upper bounds for E(n/B, R ) can easily be 

1 1 

extended to the more general case when the equalities A — A = ^ = -v do not 

necessarily hold. However, the resulting formulas are more cumbersome and we 
shall merely give without proof the upper bound corresponding to (6.14). This 
upper bound becomes 

- N + - *.] + “ * W ] 


/N\c~ aVi e~ btn ~ 
y 27T 0 X ’■(' 


where 


hn = - log A 
hi = log A 


ho = — log B 
hw = ~ log B 


hu — ha 


<h = -a! 
h - NO 
cVN ; 1 


h + NO 

*Vn'‘ 


7. An Example. We shall consider the following example 


hi + ho 
2 


<r — 1, Ox = — xij, — — ttl 6i = A<Si 6i = -ftf, Yi = Yj — 73 = 7 = .029 
then 


A = l = ^ = i = -- 1 = 33,5 

x > 2> 7 


r = 7»3>k~ 1.47 


h = 1. log A = 28, X = °±±°i 


1, A = — 


Using formulas (6.1) and (6,7) the following upper and lower bounds were ob¬ 
tained 



8 

9 

10 

12 

14 

16 

18 

20 

16 

16 

16 

16 

16 

16 

16 

16 

112 

89.6 

74.7 

56 

44.8 

37 3 

32 

28 




























A SEQUENTIAL DECISION PROCEDURE 


519 


Formulas (6.14) and (6.1) yield 


B 

0 

1 

2 

3 

16 

16 

16 

Upper Bound , ... 

146 

163 

229 

450 

Lower Bound . 

112 

149 

224 

421 


In the neighborhood of the origin the true value is very nearly the upper bound. 
From formulas (6.19), (6 22) and (6.1) we obtain 


6 

3 

16 

4 

16 

5 

16 

Upper Bound 

422 

784 5 

423 

Lower Bound . 

421 

784 

421 


As shown above for the end points of the indifference zone, (6.19) gives better 
results than (6.14) or (6.7). This is as it should be since (6.19) takes into account 
possibilities omitted in (6.14) and (6.7). The greater accuracy of (6.19) is offset 
by a slight increase in computation. 

In the graph of the Bounds of the ASN function shown in Figure 2, a single 
curve is shown wherever the upper and low T er bound are sufficiently close to 
each other. 

Since (6 14) contains an even function of 0 and since elsewhere the correspond¬ 
ing bounds are mirror images with respect to 0 — 0, the bounds for negative 0 
are exactly the same as those for the corresponding positive 8 

Consider the following non-sequential rule applied to our problem. With a 
fixed number No of observations compute the mean x and accept Hi if x falls in 
the interval ( — °°, Oi), accept Hi if x falls in , a 2 ] and accept Ho if x falls in 
(a 2 , «). This is certainly a reasonable procedure. One can also verify, that no 
other non-sequential rule exists that is uniformly better (for all possible values of 
8) than the one under consideration. 

The two decision procedures become comparable if we introduce the indiffer¬ 
ence zones and define a wrong decision in the non-sequential case exactly as was 
done for our sequential procedure (see Section 1). 

For the non-sequential case (just as in the sequential case) the probability of a 
wrong decision will be discontinuous at 0i, 0 2 , 0» and 04 . At each of these points 
there will be a left-sided and right-sided limit, different from each other. As in the 
sequential case we shall take the probability of a wrong decision at a discontinuity 
point to be equal to the larger of the left and right hand limits One can easily 
verify that the maximum probability of a wrong decision occurs at 8 = 63 (which 
is equal to the value at 8 = 9,). 




520 


MILTON BOBEL AND ABRAHAM WALD 


We then determine No by setting the maximum probability of a wrong decision 
equal to y, i,e. 

(7.1) # (~ VFo) + VFo) - 1 - 7 . 



For the particular problem considered above, this gives Wo ■= 915.4. Henoe 
910 observations are required in order to ensure' that this non-sequential pro¬ 
cedure will have the maximum probability y = .029 of a wrong decision, This 
is to be compared with the maximum over all 5 of the ASN function in the 
sequential procedure, which was 784,5. 

Returning to (7.1) we shall derive lower and upper bounds for the root of that 
equation. Since 

« > ^- 2 vFo £ £ Vn, 

a la 









A SEQUENTIAL DECISION PROCEDURE 


521 


it is clear that the root of the equation 

*(£ + *(t^) = 1 ~y 

is an upper bound for the root of (7.1) and that the root of the equation 
<t>(.°°) + <t>‘s/Noj =1 — 7 


or 


^ ~ 


is a lower bound for the root of (7.1). We shall compare the value of x m —\/N q 

2(7 

■with the value of y = — -\/Max ASN. Since 

2c r 0 

/) 2 2 / 1 — \ 2 A 

Max (ASN function) ~ ^ ^log ——1 \ (for sufficiently small ^ ). 


then 


A 1 2 _ /y ^ 

= r- vMax ASN - log -~ (for sufficiently small -). 

Z& 0 & f y d 


The following table gives upper and lower bounds for x and the corresponding 

value of y for the type of example under consideration, i.e., when A = A = - = j. 

■D B 

and r ^ k. 


7 

001 

.002 

005 

.008 

01 

.05 

1 

* and 2 

3-P8-3.31 


2.57-2.81 

2.41-2.65 

2.33-2.58 

1.64-1.98 

1 28-1.65 

V 

3.45 

3.11 

2 65 

2.41 

2.30 

1.47 

1.10 


As the table shows 4 for .1 > y > .008 

x > $ > y 


* Actually, the inequality in question is shown only for the values of y given in the 
table. However it can be verified that the inequality remains valid for all values of y be¬ 
tween .1 and .008. 


















522 


MILTON SOBEL AND ABRAHAM WALD 


and hence 

iVo > Max ASN (for sufficiently small 

i a 

The statement and the table above are not meant to delimit the region in which 
the sequential rule is superior to the non-sequential procedure. 

REFERENCES 

[1] Abraham Wald, Sequential Analysis, John Wiloy and Sons, 1947, 

[2] P. Armitaoe, "Some sequential tests of student's hypothesis," Supplement to the Journal 

of the Royal Statistical Society, Vol. 9 (1947) No. 2, p. 250, 



MOMENTS OF RANDOM GROUP SIZE DISTRIBUTIONS 1 

By John W. Tukey 
Princeton University 

1. Summary. A number of practical problems involve the solution of a mathe¬ 
matical problem of the class described in the classical language of probability 
theory as follows. “A number of balls are independently distributed among a 
number of boxes, how many boxes contain no balls, 1 ball, 2 balls, 3 balls, and 
so on.” Problems arising in the oxidation of rubber and the genetics of bacteria 
are discussed as applications. 

A method is given of solving problems of this sort when “how many” is 
adequately answered by the calculation of means, variances, covariances, third 
moments, etc. The method is applied to a number of the simplest cases, where 
the number of balls is fixed, binomially distributed or Poisson and where the 
“sizes” of the boxes are equal or unequal. 

2. Introduction. The distribution of the number of empty boxes has been 
investigated by Romanovsky in 1934 [3], and, apparently independently, by 
Stevens in 1937 [4], Romanovsky investigated the case of N equal boxes and 
m balls for (i) the case where the balls are independent, and (ii) the case where 
there is a limit to the size of each box. He gives no motivation for the problem, 
and shows that certain limiting distributions approach normality. Stevens 
investigated the case of m independent balls for N boxes (l) of equal size, and 
(ii) of unequal size, and developed a useful approximation for the last case. 
Stevens was concerned with this problem in order to test box counts for non¬ 
randomness by comparing the number of empty boxes with expectation. The 
reader interested in that problem is referred to his paper. 

The results derived in Part II axe based on the use of a chance generating 
function, a technique which applies easily to the case where the balls are inde¬ 
pendent. Thus Romanovsky’s results for the case of boxes of limited size are 
neither included or extended For the other cases where the number of empty 
boxes has been considered, the results below seem to provide simple moments 
and cross-moments for the numbers of boxes with any number of balls to the 
extent previously available for the number of empty boxes. Both Romanovsky 
and Stevens investigated the actual distribution of the number of empty boxes. A 
similar investigation of the distribution of the number of b-ball boxes has not 
been carried out here. 

3. A chemical problem. In studying the oxidation of rubber, Tobolsky and 
coworkers were led to propose the following problem: 1 If a mass of rubber 
originally consisted of N chains of equal length, if each chain can be broken at a 

1 Prepared in connection with research sponsored by the Office of Naval Research. 

523 



524 


JOHN W. TUKEY 


large number of places by the reaction with one oxygen molecule, if there are m 
oxygen molecules each equally likely to react at each link, and if mNp molecules 
have reacted, what is the probable number of original drains which are now in 
b + 1 parts as a result of b oxygen molecules having reacted with b of their linltB? 

Here an original chain plays the role of a box and an oxygen molecule the 
role of a ball. The sort of numbers which may be taken an dmruetcriatic are: 

N = 10 18 (number of chains), 

m = 10” to 10 M (number of oxygen molecules), 
mp = 0.01 to 100 (average breaks/chain). 

Thus it is almost certainly going to be appropriate to use the results obtained by 
assuming N and m very large and p - 1/N very small. We shall return to this 
example after discussing the general results, 


4, A bacteriological problem. The experiments of Neweombo [1] on the 
irradiation and mutation of bacteria have prompted Pittendrigh to propose the 
following problem: "Suppose a large number of bacteria each contain m enzyme 
particles, which have been formed by the action of a nuclear gene. Suppose 
that irradiation destroys the nuclear gene in a certain fraction of the bacteria, 
Suppose three generations to occur, during which the in original enzyme particles 
are randomly distributed among the 8 descendants of an original bacterium. 
If a bacterium without either nuclear gene or enzyme particle is a recognizable 
mutant, what is the expected distribution of "families” with 0, 1, 2, 3, ■ * • , 8 
mutants?” 

Here the enzyme particles are the balls, and the 8 descendants are the N 
boxes. We are interested in the number of empty boxes—the problem is that 
discussed by both Romanovsky and Stevens, with the exception of an allowance 
for cases where the nuclear gene was not lost, We Bhall return to this problem 
also after discussing the general results. 


6. The case of large numbers. In case the number of "balls” and "boxes” is 
large, it is natural and has been customary in similar problems to replace discrete 
variables by continuous, and derive differential equations. The process runs as 
follows: Let J/o, I/i, Vi , ■ ■ • , Vt, • • • be the fractions of the total number of boxes 
containing no, one, two, ■ • ■ , b, ■ • • balls. Let t be the average number of balls 
per box (artificially made continuous, so that we may, for example, have a total 
of 13 + 3it balls). Increase i to t + dt, then of the y 0 boxes previously containing 
no balls, yt, dt will receive one. Of the y\ boxes previously containing one ball each, 
yi dt will receive a second, and so on, Hence 


dy o 
dt 


-Vo, 


dyi 

-=y a - yi , 



RANDOM GROUP SIZE DISTRIBUTIONS 


525 


dyt 

dt 2/4-1 Vi ’ 


and if we start, when i = 0, with y 0 — 1, and y b = 0 for 6 > 0, we find 


Vb= b\ e ’ 


b = 0,1, 2 , 


The usefulness of this result has sometimes been in doubt, thus Opatowski 
[2, p. 164] says in a similar connection: “Consequently • • ■ the theory appears 
less accurate for small values of t ” 

It is shown in Part II that; where n b boxes out of the total of N contain 
exactly b balls: (I) When the number of balls and boxes is large and fixed, (1) 
is a good approximation to the expectation of n b /N. (II) When the total number 
of halls has a Poisson distribution, and t is interpreted as the expected number, 
(1) reproduces the expectation exactly. Since it is appropriate in most problems 
involving chemical reactions or irradiation to take the number of balls as having a 


TABLE 1 


A fixed or binomial number of balls and equal boxes 






526 


JOHN W. TUKEY 


Poisson distribution, the caution suggested by (I) is often shown unnecessary 
by (II). For this type of problem the differential equation is entirely adequate 1 

It is further shown in Part II that, in the Poisson case, the second moments 
are exactly those which correspond to random sampling from an infinite popula¬ 
tion with the fractions indicated by the mean number of boxes with 0, 1, 2, • • ■ , 
b, ■ • ■ balls. This result is not accidental, and it is shown in Part III how we can 
see directly that the whole distribution in this ease is that of random sampling 
from such a population. 

6. The case of small numbers. The results of Part II also allow us to state 
the means, variances, and covariances, for the eases where the differential 
equations do not apply. The results are set forth in the following tables: Tables I 
and 2 apply to tire cases where m balls are distributed among the given boxes 
and possibly others. Thus the total number of balls in the given boxes is either 
fixed, when there are no other boxes, or follows a binomial distribution, 


TABLE 2 

A fixed or binomial number of balls and unequal boxes 


HYPOTHESIS 

A total of m balls are independently distributed into N boxes or elsewhere, the 
chance of a particular ball entering the z'th box being p,. The average of the 
Pi — p. The sum of the squared fractional deviations of p,- from p is A. 
Pi ~ p{ 1 + A,), 2, A? = A. Terms in 2,-A 8 ,, 2,Aj , etc, are to be neglected. The 
number of boxes each containing exactly b balls is nt. 


Mean of n b = E(n b ) = N ^ (1 - p) m ~ b p b times 

{(* + ^ mp ~ ~ } 
Variances and covariances as in Table 1, using 


where f ~ 2 be 


& ~ ff) 


+ terms in p 2 and in 


The exact value of f is given in Section 16. 





RANDOM GROUP SIZE DISTRIBUTIONS 


527 


TABLE 3 

Poisson balls and equal boxes 



7. Discussion of the chemical problem. The number of oxygen molecules 
which have reacted in a given time is, at best, distributed Poisson Thus the 
differential equations would give the expected number of cuts, even if the 
number of balls or boxes were not large. 

The fact that the numbers of balls and boxes, are large makes the variances 
and covariances so email as to be practically unimportant. Thus, for example, 
with N = 10 18 , t = 1 (1 break per chain), we have: 

mean of no = - X 10 18 , 
e 


mean of % = - X 10 18 , 

e 


variance 


variance of ni 


of no = ^1 - ^ X 10 18 , 

0 f m = i - J) x io u . 


covariance of no and n i 


X 10 18 . 


Thus the standard deviations are less than 1 part in 100 million of the mean. 





528 


JOHN W. TUKEY 


TABLE 4 

Poisson halls and varied boxes 


HYPOTHESIS 

A number of balls with the Poisson distribution are independently placed in N 
unequal boxes. The expected number placed in the ith box is U . The average of 
the U is t, l , = f(l + X,') and S,X? = A. Terms in 2,x! , 2<\i , etc. are to be 
neglected. The number of boxes each containing exactly b balls is n b . 


Mean of n b = E{n b ) = N~ <T‘ (l + A ((h - 0* - «) 

Variance of n b = E(ru,) -^(l + 

Covariance of n b and n„ = + ((b - l )(c - ())EWEW) 


Mean of n 3 = Ne 1 (1 + 


Mean of rn = Nte * (1 + 


2N / 

A (f J - 2<)' 
2V , 


Variance of no = We ‘ ^1 + - Ne~ n ^1 + 

Variance of n x = Me -1 ^1 4- ^ 

_ (i + 

Covariance of no and m = —W£ 2 e _!!< -j- ^ 


Covariance of no and ?u = —Nt e I 1 -j- 


8. Discussion of the bacteriological example. Although this example came 
from an irradiation experiment, we are not entitled to jump to the Poisson 
case. The balls are not actions of radiation, but rather previously existing 
enzyme particles. The purpose of the radiation is merely to make a failure to 
hand down a particle obvious. 

For simplicity, let us begin by assuming that the irradiation has been strong 
enough to knock out all the nuclear genes and none of the enzyme particles. We 
face the following problem: “If the m enzyme particles are divided by chance 
among 8 descendants, what should be the distribution of mutants, that is, of 
boxes with no balls?” 

As far as mean and variance, we can answer this question from Table 1, 
with N = 8 and p = i 




RANDOM GROUP SIZE DISTRIBUTIONS 


529 


The results are 

mean number of mutants = E(no) = 8{£) m , 
variance of same = 8(£) m - 64(f) 2m + 56(f) m . 
For small values of m we get the values tabled below: 

TABLE 5 


Blanks out of 8 


m 

mean 

variance 

/ mean\ 

mean 11-— 1 

0 

8 

0.000 

0.000 

i 

7 

0.000 

0.875 

2 

6125 

.109 

1.436 

3 

5.359 

.262 

1.769 

4 

4.689 

417 

1.941 

5 

4.103 

.556 

1.998 

6 

3.590 

.666 

1.979 

7 

3.142 

.747 

1.908 

8 

2 749 

.799 

1.804 

9 

2 405 

.825 

1.682 

10 

2.105 

829 

1.551 

15 

1.079 

.663 

934 

20 

0.554 

.426 

.515 


We notice that the variance is substantially less than the mean. 

Now it might be that the number of enzyme particles is not constant from 
bacterium to bacterium. It would not be unreasonable if it had a Poisson dis¬ 
tribution. If this were the ease, we would revert to the differential equation 
solution, which is also given in Table 3. The last column in Table 5 shows the 
variance which would then arise for the same means. The variance is still some¬ 
what less than the mean. The situation is shown graphically in Figure 1. 

If the actual distribution of n 0 is desired, then it can be calculated for the 
case where m is fixed from the tables in Stevens' paper [4], and when m is 
distributed Poisson it is merely a binomial distribution. 

PART II 

DERIVATIONS 

9, The chance generating function. We are considering the following class of 
problems: “balls” are placed independently in “boxes” and then the number wo 
of empty compartments, the number ni of compartments containing exactly 






530 


JOHN W. TUKEY 


one ball, • * • , the number m of boxes with exactly b balls, and so on, are observed. 
We are interested in the moments of n Q , n t , rh , • ■ • , m , • ■ • both simple 
and mixed. 


RATIO OF VARIANCE TO MEAN 
FOR NUMBER OF EMPTY BOXES OUT OF EIGHT 



Figure 1 

We define chance quantities X{ q by 

lx, gth ball in the ith box, 

Xi„ — i 

[1, otherwise. 

Clearly the product of all X{ g for fixed i is given by 

= X °f ball* In the *th box) 

Thus = s* if and only if there are exactly b balls in the ith box. Hence 
the coefficient of in , the sum of n^,, over all boxes i, is n b , the 

number of boxes containing exactly b balls. 




RANDOM GROUP SIZE DISTRIBUTIONS 


531 


We have the relation 

S b nix b = f(x) = SJIjZ,, , 

where f(x) is a chance function, and the n t and the x iq are chance quantities. 

Now we take expectations of both sides, and use the fact that the expectation 
of a sum is the sum of the expectations to obtain 

= E(f(x)) = HJBQS&J 

Now Xiq and x ir , for q ^ r, are independent since they are determined by 
different and independent balls. Hence E(Iltp; ia ) = U q (Ex 3Q ) and we have the 
basic formula 

(1) E{f(x)) = 2 ix b E(n b ) = 2iH g E(x lQ ). 

10. Higher moments. By extending this device, we can obtain generating 
functions for higher moments. Instead of the x iq , we introduce a whole sequence 
of chance quantities x tq , y iq , z iq , • ■ , w iq , defined by 

1 (x, y, ■ ■ , w), gth ball in ith box, 

(1, 1, • • ■ , 1), otherwise. 

We find immediately that 

■ ■ ■ /(w) = (2 a n 9 .'c, 5 )(s j ii r i/ jr ) ■ ■ • (SniXpiUnp) 

= 2,2 , ' • ‘ 2„IIgX,g7//g • * * W n q . 

Taking expectations on both sides 

E(J(x)f{y) • • • /(w)) = 2,2, • • • 2 n E(II c x, a yj q • • • w nq ) 

= 2,2, • ■ • X n Jl Q E(x iq yj q • • • iD n g), 

where we have used the fact that x tq y iq ■ • • w nq and x ir y jr • • ■ w nr are independent 
when q ^ r since they are determined by different and independent balls 
On the other hand, 

f(x)f(y) ■ ■ • /(to) = (2 6 n^")(2 c n c 7/) • • • (2 a n a w a ) 

= 2 6 2 0 • ■ • 2 a (n t n c ■ • n a )(x h y° ■ • • to") 

so that 

E(j(x)f(y) ■ ■ ■ /(to)) = 2i,2„ • • • 2 a (xV ■ ■ ■ w a )E{n h n q • ■ • n a ). 

Equating the two expressions for the expectation of f(x)f(y) ■ ■ ■ f(w), we have, 
finally, the generating function for E(ntn c ■ ■ ■ n n ) m the form 

(2) ^ (xy° ■ • ■ vf)E(ribn c • • • n a ) = 23 n q E(x{ q y H • ■ • w nq ). 

6,c. • • i a *iJ» 

Thus a knowledge of E(x iq y 3q • • • w ng ) will allow us to determine the moments 
of the n’s. 



532 


JOHN W. TUKEY 


11. A fixed or binomial number of balls and equal boxes. Let there be N 
boxes, and m balls, each with probability p of entering each box. If ptV = 1 we 
have the case where m balls always appear in the boxes taken together—the 
case of a fixed number of balls. If pN < 1, the number of balls appearing in all 
boxes taken together is a binomial with expectation mpN. 

Now Xi q equals 1 with probability 1 — p and equals x with probability p, 
hence (1) becomes 

2 &E(n b ) = 2,11,(1 — p + px) = N( 1 — p + px) m . 

Using the binomial theorem, the coefficient of x“ is 

(3) E(n*) = AT (fy (1 - p) m -y = N (fy (1 - p) m (j~f- 

Now if p is small, we may approximate 1 - p by e~ r and by 1, respectively, 
in its two occurences, where 

E(n b ) « 

and if m is large compared to 5 this becomes 

E(n„) « N e~ mp . 


12. Second moments. We must study E(x{ q yj q ). If i = j then this is 
(1 — p + pxy) Bince the qth ball falls into both the ith and jth. boxes with proba¬ 
bility p, otherwise into neither. If i j, we imtnediately find the expectation 
to be (1 — 2p + px + py). 

Hence, since i = j in N cases, and i ^ j in N{N — 1) cases, 

H (i U q E(x iq y }<l ) = N(1 - p + pxy) n + N(N - 1)(1 -2 p + px + py) m , 
by (2) this equals hbi>^yE(nbno), and using the multinomial expansion we find 

E(n„n 0 ) = N(N - 1) (1 - 2p)-*-p u + 9(b, c)N (1 - p)-y, 


where 5(6, c) = 
coefficient 


1 when b = 
is given by 


c and is zero otherwise, and where the multinomial 


m\ _ml_ 

be) 6!cl(m — 6 — c)( 


We now set 
(4) 


E(n b n 0 ) = E(nb)E(n 0 )$(b, c) + 5(6, c)E(n h ) 



RANDOM GROUP SIZE DISTRIBUTIONS 


533 


when 


c) = 




N (”)« - »)■ (rh)‘* (“) (1 -* r W 


(5) 


/, 1\ Gc) / l~2p \Vl- 

V at/ ^vj / m ^ V(1 - P)(1 - vv VI - 

u(w — 1) • • • (« — 6 + 1) denotes a descending factorial with b 


- pY** 

2p/ 


,d>) 


where w 
factors. 

Notice that, if the n b were independently distributed in Poisson distributions, 
the second moments would be given by the same formula with $(b, c) = 1, 
while if they were distributed like a multinomial sample from an infinite popula¬ 
tion the second moments would be given by the same formula with <t>(b, c) = 1 — i. 


For small p, we have 


, . / 1 \ (m — 0 )« 

$(b, C) » (l Jfj —^ , 

and if m is large compared to b and c, this approaches the multinomial value 

$(&, c) 


0-i)- 


13. Variances and covariances. The variances and covariances are given by 
Variance (n b ) = E{n b n b ) — E(n b )E(n h ) 

= E(n b )( 1 - (1 - *(b, b))E(n b )), 
and 

Covariance (n b , n„) = — (1 — $(f>, c))E(n b )E(n 0 ). 

Thus the covariance of n b and n, will vanish when, and only when h( 6 , c) = 1. 
Let us suppose piV = i, with p small and m and N large, and see if $(f>, c) 

H 

can be unity. Since a preliminary calculation shows it to be reasonable, let us 
put m = 7 N. Then 

*». =)«(!- ft>) o - pV"0 + P) 14 -. 



534 


JOHN W. TUKEY 


An easy calculation shows that the ratio of descending factorials is nearly 

e ~bclyff = 

making further natural approximations, 

In $(5, c) « — @p — p — yNp* + (6 + c)p 
7 

and this may be written 

In Hh, c) ~ -g ((^j - b-c + pj + 4(lc- (b - /3~ c ) 5 ), 

and this vanishes for real 7 when and only when | b — (3 — c | > \/40c. This, 
then, is the condition on b and c which permits the existence of two ratios of m 
to N so that for either ratio and large iV there will be no correlation between 
Tib and n c . 

14. Higher moments. To deal with the third moments, we need E{x in y tq Zkf), 
which is easily seen to behave as follows: 


Relation of ijk 

number of occurrences 

Expectation of y,- q z* e 

l = j = k 

N 

1 - p + pxyz 

i = j 7* k 

N(N - 1) 

1 — 2 p + pxy -f pz 

i = k t* j 

N(N - 1) 

1 — 2p + pxz + py 

* <s> 

Ik 

II 

'*•7 

N(N - 1) 

1 — 2p + pyz + px 

different 

N(N - 1)(JV - 2) 

1 — 3 p + px + py + pz 


Thus we have 

'Z bci xyz i E{n h n a n d ) = N(l — p + pxyz) m + N{N — 1)(1 — 2p + pxy + pz) m 

+ N{N — 1)(1 — 2p + pxz + py) m + N{N — 1)(1 — 2p + pyz + px) m 

+ N{N - 1 )(N - 2)(1 -3 p + px + py + pz) m 

from which we can calculate all third moments. 

In general if e is a decomposition of the product xyz ■ • • w into a monomials 
th, Ut, • ■ ■ , u„ , where order is disregarded (for example: xyz = (xz)y = (zx)y =» 
y(zx) = y(xz) is a single decomposition with a = 2, Ui = xz, = y), then tho 
generating function becomes 

~S,N (a \l + (tii + iw + • • • -f- u a — a)p) m . 

16. Poisson balls and equal boxes. To reach a Poisson distribution we let 
m —* °o and p —* 0 so that mNp = tN, where t is the average number of balls 
per box in the Poisson distribution. 








RANDOM GROUP SIZE DISTRIBUTIONS 


535 


Since 



under these conditions, (3) becomes 

(6) E(n b ) - N~. e~ l 

b\ 

and from (5) it follows that the limit of $(5, c) is ^1 — so that 

,£+c ,b 

(7) E(n b n c ) = N(N - 1) i-, <T 2 ' + &(b, c)N~ e~\ 

old o' 

and hence 

(8) Variance (n&) = N e~‘^j ^1 - ^ e~‘^ , 

(t b 

(9) Covariance (n b , n c ) = —N ( ^ e 

Notice that these are the moments of the numbers of objects of types b, c, ■ , 
in a random sample of N from an infinite population where the fraction of 
b’s is t h e~‘/b\ just as it should be. 

16. Fixed or binomial balls and varied boxes. We now consider the case 
where the chance of any ball entering the tth box is p,. We shall again not 
restrict ourselves to the case 2,-p, = 1 

The expectation of x iq is immediately seen to be (1 + p % (x — 1)) = 
(1 — pi + p x x), so that the generating function is 

f(x) = 2,(1 - p, + p % x) m 

and the expectation of n b is 

(10) - (”) «1 - p.)"-‘ u! - (j) «i - ri‘ (jZjJ ■ 

Following Stevens [4] with a slight modification, let us set p, = p(l -j- A,), 
where p is the average of the p x , so that 2,X, = 0. Then 

(1 - p .) = (1 - p( 1 + \)) = (1 - P) (l - 

so that 

2,d - vr~ b v\ = a - v) m ~ b v s. (i - 6 (i + A -) 6 . 




536 


JOHN W. TUKEY 


Expanding the summand, we find 
- b)p 


1 + 




, f(m - b)(m - b - l)p s (m - b)bp b(b - 1)\ . nA „ 

+ \-- r^J- + 2 / x ‘ + 0(X>) - 

Hence, setting 2,X? = A (notice this is not the same as Stevens’ A!), we have 




r n (i - pr~v 


N + *A 


/ m — b 
\m — 5 — 1 


(p(m — 1) — b} 8 b(m — 


A)) 


(1 — p) 1 m — b 

The expectation for all p< = p has been modified by multiplication by 
( 11 ) i A / m ~ b (p(m — 1) — b) 1 b(wt - 1) 


+ 0(1, Xj). 


2Ar\m - b - 1 (1 ~ p) r 


m — b 


4 } 


plus terms of higher order. For large N and consequently small p the quantity in 
braces is nearly 

b (b - 

\ m — bj 

and more roughly is approximately b a . Similarly, the expectations of second 
moments are 

Bfan.) = ( 6 m c ) Z (l -P<- vr~^vW, + »G>, Z. (1 - vHT'v'i , 


whence 


( 12 ) 


4>(b, c) = 




(“)(”) £< n - p.)"“‘p!E,(i - p,)— j>i 

Making the same sort of expansion yields 

<* •»»0 ■- h) 0 - o~-~p)'f (}HT (> + ») 

where terms in 2X, have been neglected (note that 

Z X<X, = —Z X s , =» —A-), 

if*f i 

and where 

m — b — c N — 2 


* = 


m — b — c — 1 2V — 1 (1 2p) * m - b - 1 (1 “ p) 


■ {p(« - 1) - b} 8 



RANDOM GROUP SIZE DISTRIBUTIONS 


537 


m — b — c N — 2 o \-2 vi — c . ,_ 2 

m~ b -; - I yu (1 - 2y) - m-v- i (1 - p) 


[p(m - 1) — c} 2 


1 m — b — c { N N — 2 


2 m — b — c — 1 W — 1 iV-1 


(1 - 2p) Mi - c) 


m — b — c — 1 liV — 1 m — 6 — 1 m — c — 1 


This can be reduced to 


* = 2 be ( 2 p - i) + 0(p 2 ) + 0 
b 0(p 2 ) + 0 (j^j. 


and for p = l/A + 0(p ) + 0 


\p = 2p6c + 0(p 2 ). 

17. Poisson balls and varied boxes. To reach the Poisson limit, we let m —> «> 
and p, — * 0 so that nip, = ti . The generating function for first moments becomes 

/(*) = 2,e -< ‘ +< ‘* 

and the expectation of n b is 

(15) E(n„) = E f |<T\ 

If we set 1 !, = <(1 -f- Ai), this becomes 

fl(n k ) = -^-e' , 2.(l + X,)V‘ x * 

The summand expands in the form 




x (1 - *x, + ^x? - gx; + ■ 

= 1 + (b - ox, + - bt + 0 x? + 

If J is chosen as the average of the t, so that 2\; = 0, the sum becomes 
2V + ( (b ~ f - b ) 2X? + EX 3 , + . 


— + n ) X, + 


Again setting 2X? = A we have 


n •■*(»+(' 


(b - 0 2 - & N 



538 


JOHN W. TUKBV 


which can be written 

EW « n 1 ~ <r‘ (1 + A (a, - ty - b)). 

The generating function for the second momenta is 

S(x)f(y) = " + *" 

so that the expectation of mn c is 

(17) 

which becomes 

E{ni,7ic) = A. e 51 5^ (1 + AiO'’(l + Xj) c fi 1X1 IX/ + c)E(ii),), 

old 

whence we can derive 

(18) ‘Kb, c) » 1 - i - A (6 _ 0(c - f). 

Thus 

(19) Variance («k) i?(nt) - j 1 + ^ “ 0*1 (EM)', 

(20) Covariance (n^fte) « — i ^1 + (b — l)(c - t)^j E(ni)E(n e ), 

18. Boxes in a systematic square. Another case which it may be worthwhile 
to write down arises when the boxes arc systematically “rotated” under "spouts” 
of different probability. That is, the number of balls m is a multiple of the 
number of boxes V, and the probability of the <yth ball entering the ith box 
depends on the value of q — i taken modulo N. An example for iV = 3 and 
tii = 6 follows: 


Probabilities of entry 


Box 

Ball 1 

2 

3 

4 

5 

6 

1 


Pi 

Pi 

Po 

Pi 

Pi 

2 


If' ' 

Pi 

Pi 

Po 

Pi 

3 



pa 

Pi 

Pa 

Po 


If m = /cV and the subscript r runs through 0, 1, 2, ■ • ■ , AT — 1, then the 
expectation of f(x) becomes 

2.2*„) = AMn r (l - Vr + Vr*)\ m . 
















RANDOM GROUP SIZE DISTRIBUTIONS 


539 


Thus first moments, and by proceeding similarly higher moments, are available 
for this case also. 


PART III 


THE POISSON CASE 


19. The Poisson case with equal boxes. The Poisson case is obtained in the 
limit as m -> °o and p 0 with pm = t We wish to show that, in the limit, the 
number of balls in the different boxes are independent. Let k, h, ■ ■ ■, k N be 
the number of balls in the first, second, • • •, Nth box, respectively. Then the 
distribution of the k’s is given by, where we write k = k + k + • • • + ^, 


m 


(*> 


fcil/cjl *" • k N \ 


V(l ~ Nf) 


m—k 


(1 -Np) m ~ l (mp) V mjl 

m k e~ mp ' fc,l 


Now the first two fractions clearly approach unity in the limit, and the inde¬ 
pendence is proved. 

Since the number of balls in each box has an independent Poisson distribution, 
the distribution of the numbers of boxes each with exactly b balls is that of a 
random sample of N from an infinite population—namely it is a multivariate 
distribution with probabilities 

(mpf f m? 

W~ ' 


REFERENCES 

[1] H. H. Nbwcombb, “Delayed phenotypic expression of spontaneous mutations in Escher¬ 

ichia coli,” Genetics, Vol 38 (1948), p 447-476 

[2] I. Opatowski, "Cham processes and'their biophysical applications; Part I. General 

Theory,” Bulletin of Mathematical Biophysics, Vol 1 (1946), p, 161-180 

[3] V. Romanovsky, "Su due problemi di distnbuzione casuale,” Gmnale dell’Islilulo 

Jtaliano degli Atiuan, Vol S (1934), p, 196-218. 

[41 W. L. Stevens, "Significance of grouping,” Annals oj Eugenics, Vol 8 (1937-1938), 
p. 67-69. 



THE POWER OF THE CLASSICAL TESTS ASSOCIATED WITH THE 

NORMAL DISTRIBUTION 

By J. Wolfowitz 
Columbia University 

Summary. Tho present paper is concerned with the power function of the 
classical tests associated with the normal distribution. Proofs of IIsu, Simaika, 
and Wald are simplified in a general manner applicable to other tests involving 
the normal distribution. The set theoretic structure of several tests is charac¬ 
terized. A simple proof of the stringency of the classical test of a linear hypothesis 
is given. 

1. Introduction. The present paper is concerned with the optimum properties, 
from the power function viewpoint, of the classical tests associated with the 
normal distribution. In 1941 Hsu [2] proved the result stated in Section 2 below, 
which is concerned with the general linear hypothesis (in this connection his 
paper [1] of 1938 will be of interest). Also in 1941 Simaika [3] proved similar 
results for the teste based on the multiple correlation coefficient and Hotelling’s 
generalization of Student’s l. In 1942, Wald [4] gave a generalization of Hbu’s 
result. 

In the present paper we give short and simple proofs of almost all these 
results, and a simple proof of the stringency property of the analysis of variance 
(Section 5). These proofs rest on theorems which characterize the set theoretic 
structure of the tests. Thus, while the proofs of Hsu, Simaika and Wald are 
rather elaborate and each problem is essentially attacked de novo, tho methods 
of the present paper are in effect applicable to tho classical tests based on the 
normal distribution. For those testa it will not be difficult to demonstrate the 
analogues of Theorems 1 and 3, and of the results of Hsu, Simaika, and Wald. 
In the present paper we first treat the general linear hypothesis, because it is the 
simplest problem, its solution is easiest to describe, and it admits Wald’s integra¬ 
tion theorem. Multivariate analogues of the latter are rather artificial and not as 
simple. We then discuss the problem of the multiple correlation coefficient, 
because it seems to be more difficult than that of Hotelling’s T and indeed, to 
include all tho essential multivariate difficulties, Theorems 6 and 7 are tho 
analogues of 1 and 3, respectively, while Theorem 9 describes the essential 
property of tho power function which is of interest to us. In other multivariate 
problems one will prove the analogues of Theorems 0, 7 and 9, A generally 
inclusive formulation is no doubt possible. Theorems 5 and 9 are slightly more 
general than the theorems of Hsu and Simaika, 

Many of the statements below may be not valid on exceptional sets of measure 
zero. Usually this is so stated, but sometimes, for reasons of brevity or to avoid 
repetition, this qualification may be omitted, The reader will have no difficulty 
supplying it wherever necessary. 


540 



POWER OP THE CLASSICAL TESTS 


541 


The author is indebted to Erich L. Lehmann of the University of California, 
who carefully read a first version of this paper. Theorem 4 below was arrived at 
independently by Professor Lehmann, with a somewhat different proof 


2. The general linear hypothesis. In canonical form the general linear hypothe¬ 
sis may be stated as follows: The chance variables 

Xi, Xt, ■ ■ , Xk+i 

have at Xi, • • ■ , Xk+i , the density function 

(2 1 ) (\/ 2t a) (fc+I ) exp 2 ~ 2 |^ (*. — ih) 2 + = jfah <T ) 

with ff, 771 , • • ■ , 17 *, all unknown. 

Let 17 be the vector ( 171 , ■ • ■ , 774 ). The null hypothesis Ho states that 

171 = • ■ • = nk = 0 

and is to be tested with constant size a < 1 (identically in a). 

Let D be any admissible critical region for testing H 0 . If A ib any event let 
P{A\v, v] denote the probability of A when 77 and a are the parameters of 
(2 1) We have then 

P{D | 0,o-} = a 

identically in a, where 0 is the vector with h components all of which are zero. 
We now prove a property which characterizes all D. This theorem is due to 
Neyman and Pearson [12], and is given here only for completeness. 

Theorem 1. The fraction of the surface area of the sphere 

k+l 

22 x] = c 2 


which lies m D is a for almost all c 

Proof. Let a be any positive integer, h a positive parameter, and f{y) a 
measurable function of y defined for y > 0 and such that 0 < f(y) < 1. In view 
of the distribution of hXl , it will be enough to prove that, if 


ft‘ +1 r 

T(a -f- 1) Jo 


'P(y)y a e hv dy 


a 


identically for all positive h, that then 

i(y) = a for almost all y. 


Write 


( 2 . 2 ) 


aT(a + 1 ) 


[ t(y)y a e hv dy =K 
J 0 


-(a+1) 


Differentiating both members fc times with respect to h and then setting h = 1 



542 


J. WOLFOWITZ 


we obtain tlie following result. The function 

35T5 * w ‘" 

is a density function with /cth moment 

Hi, *■ (o + l)(tt + 2) ■ ■ • (a -f- /). 
The moments p* are the momenta of the density function 


1 

r(o + 1 ) 


V e 


They satisfy the Carleman criterion [5, p, 19, Th.1.10], and hence no essentially 
different distribution can have these moments. This proves the desired result. 

Theorem 2 (Wald). Among all tests of the general linear hypothesis the analysis 
of variance test has the property that, for all positive d, the integral of its power on the 
surface y = d % is a maximum. 

Proof. Let c be any positive number. We have only to show that if wo allocate 
to the critical legion D of the test the fraction a of the surface, area of the sphere 

(2.3) E x* - c 1 


for which 




k+l 

E ** 


*+i 


is as large as possible and that if we do this for all c, the desired maximum of the 
integral of the power will be achieved. If C is as large as possible so is 


E*? E *5 

i i 



Let oil , ■ • • , ak+i be any point on the sphere (2.3). Let db be the differential of 
area on the surface y = d\ Then 


(2.4) [■■■ f f(y, tr) db = (VE 



(S + C?) \ 

2<r* / 



where z is the vector (ai, , a k ) and (y')'z is the scalar product of the two 

vectors. This last integral is easily seen to depend only upon | z \ and to be 
monotomcally increasing in | z | . This proves the theorem. 



POWER OF THE CLASSICAL TESTS 


543 


Corollary (Hsu). Among all tests of the general linear hypothesis whose power 
is a function of rf only, the analysis of variance is the most powerful. 

3. The set theoretic structure of tests whose power is a function only of i y’/a 1 . 
Wald’s result (Theorem 2) cannot always be extended, in its simple form, to 
tests involving the multivariate normal distribution, but this can be done 'with 
Hsu’s theorem (corollary to Theorem 2) In order to see what is involved we 
shall investigate the set theoretic structure of tests of the general linear hypothe¬ 
sis whose power is a function only of y/o*. 

Let q(xi , • ■ • , Xk) be the set of points in the region D whose first k coordinates 
are x\ , ■ ■ ■ , Xk . Let A(x i , - - * , x k , a) be the integral of 

<wr«* «*[-&{£*.}] 

with respect to Xk+i , • • • , x*+J , taken over q(x x , • • ■ x k ). We first prove the 
following: 

Lemma. Suppose the power of D is a function only of rj 2 /tr 2 . Then for two points 




Si, ■ 

• • , Xk 

and 


/ 

/ 



31 , ' 

’ * * j Sfc 

such that 


k 

k 

(3.1) 


E £ 

= e *: 2 


l 

i 

we have 




(3 2) 

A(xi , • 

• • , Xk , a) 

= A(x[ , 


identically in c, with the exception of a set of measure zero. 

Proof. Suppose the statement is false. Then under some orthogonal trans¬ 
formation T of xi, • • ■ , x k the region D would go over into a region D* with the 
following property : Let A*{x 1 , ■ ■ , x k , cr) have the same definition for the region 
D* as A(xi , ■ • ■ , Xk , A) has for D Then on a set of positive measure we would 

have 

(3.3) ii(a;i, • • • , x k , <r) ^ A*{xx , • , Xk, <r). 

We shall now show that (3.3) results in a contradiction. We have 
(3 4) P[D\v, *} = P{D*\T n ,*) 

identically in rj. By the property of the region D, therefore, we have 

P[D\ V ,a} = JPI-DIT- 1 *,*) 

1 The situation here is similar to that described m footnote 3 



544 


J. WOLEOWITZ 


and hence 

(3.5) P{D\ n , a ] = P[n*\ v ,<r\ 

identically in ij. Thus we obtain 

(3.0) J (2ir<r 1 r (l ' m A(xi, , Xk , <r) exp ™ (x { - i?;) 9 j dx t • ■ • dx k 

m J (2ircr*)~ (km A*(xi , ■■■ , Xk , a) exp £ — (x, — J?i) a j dxi dx k 

with the integrations taking place over the entire space. Differentiating both 
members with respect to the components of ij and Betting 77 — 0, we obtain that 
the two density functions (for fixed <r) 

(27r<r 2 )~ <k,2, ff - '/-l (xi , • • • , Xk , a) exp ~ j X) x"|| 

and 

(27TO'T (, ‘ /s, a" 1 .4*(x l , • • ■ , Xk, <r) exp 2 ~ 

have identical moments. We shall now argue that these moments satisfy the 
conditions of Cram 6 r and Wold [7, Th. 2], so that the two density functions are 
essentially the same, in contradiction to (3.3). The Cramhr-Wuld theorem states 
the following: Let Yi, • ■ • , Yk be k chance variables with a joint distribution 
function, and write 

X,„ = L EY\" , 

i-i 

Then the divergence of the series 

Z s-U/Sn) 

A 2 « 

n“l 

is sufficient to ensure that there exists essentially only one distribution which has 
these moments. We notice that the factor 1/a of course makes no difference. 
If we set A(xi, ■ ■ ■ , x k , <r ) and A*(xi , * • • , z k , a) both identically unity and 
consider the resulting moments which enter into the Xs„, wo see that these 
moments satisfy tho Cramdr-Wold condition. Now A and A* are <1. Thus, 
using the true value of A can servo only to increase the value of \u' nn) , so that 
the series will diverge a fortiori, This proves the lemma. 

The following theorem helps to describe the sot theoretic structure of tests 
whose power is a function only of X = %/: 

Theorem 3. Let D be a test whose power is a function only of X, Let u be any 
positive number, and D(x 1 , • • , x k , u) be the fraction of the “area” of the sphere 
2 ,'_i Xk+j = v? occupied by points which are in D and whose first k coordinates 
are x lt • • • , x h . If 



POWER OF THE CLASSICAL TESTS 


545 


(3.7) S x* = S x ? 

i i 

then, except on a set of measure zero, 

(3.8) D(x i, ■ ■ , x h , u) = D(a:(, ■ ■ ■ , x[, u). 

Proof. We shall show that, if the power of D is a function only of X, the 

failure of (3.7) to imply (3.8) would contradict the preceding lemma. Suppose 
then that (3.8) is not true on a set of positive measure. Under some orthogonal 
transformation on ^ • • , x k we obtain 8 a function D*(x i, • • • , x L , u) which 

differs from D(x i, • • , x k , u) on a set of positive measure and such that, for 

almost every xi, • • • , on , 

A(xj, • • • ,Xh, cr) = IC f D(x t , , Xl,u)^ 1 u l ~ r e <_ua),2ir! du 

Jo 

= K f D*(xi, . • • , x K ,u)- l u l - l e l - M du 
Jq 

identically in <r, where if is a suitable constant of no interest to us. Multiplying 
by a 1 , differentiating repeatedly under the integral sign with respect to a, and 
setting cr = 1, we obtain the result that the two density functions in u, 

KD(x i, • • • , x K , u) i_i c ~ u »)/2 
A(a.,- ,x k ,l) u 

and 

KD*{xi, ■■■ , x k , u) i_i 
—-rv~ u e 

A(.Ti, • ■ , x k, 1) 

are identical except perhaps on a set of measure zero This contradiction proves 
the theorem. 

Theorem 4. A necessary and sufficient condition that the power of D be a function 
of X only, is that, with the usual exception of a set of measure zero, D(xi, ■ ■ ,x k , u) 
be a function only of 

u 2 

The proof of this theorem is not essentially different from that of the preceding 
theorem, and we shall therefore sketch it only briefly. Let Z be a transformation 
on (*i, • ■ ■ , Xk , u) = (x, u) which consists of a rotation of the vector x, followed 
by a multiplication of u and the components of a) by a positive constant c If 
D(x, u) is not a function of 2* x\/u alone, then, just as before 3 , we can use some 


! See footnote 1. 

J This statement implies that a function of Xi , ■ t • , xt , u, which is invariant to within 
sets of measure zero under all transformations Z (the exceptional set may depend on the 



540 


J. WOLFOWITZ 


transformation Z to give us a function D*(x, u) such that 

D(x, u ) N D*(x, u ) 
on a set of positive measure, while 

ED(x, u) = ED*(x, u) 

identically in n, a. This yields a contradiction in the usual manner and proves 
the necessity of the condition. 

To prove sufficiency, write D(x, u) = v (2 x 2 ,/u) = v(y). Let y(v, 77 , <r) be the 
density function of v. Then 

P{D\i),a\ ~ f v(v)y(v, y, u) du. 

Jo 

By hypothesis, v(v) is a function only of v. We know [ 9 , p. 140, eq. 101] that 
y(v, 77 , <r) is a function only of v and X. Hence P{D | 17 , cr } is a function only of X. 
This completes the proof of the theorem. 

Theorem 5. Among all tests of ike general linear hypothesis which have the 
properties described in the conclusions of Theorems 1 and 3, the classical analysis 
of variance test is the most powerful. 

We shall omit the proof of this theorem, which is very similar to that of the 
more difficult Theorem 9 below. 

Theorem 4 above shows that there exist regions D which satisfy the conclusions 
of Theorems 1 and 3 and such that PjD 1 77 , cr] is not a function of X alone. It 
follows that the content of Theorem 5 is greater than that of Hsu’s theorem 
(Corollary to Theorem 2 ). 

It is instructive to note that Hsu’s theorem follows almost immediately from 
Theorem 4 and the form of y(v, X). For let X be fixed but arbitrary. One verifies 
immediately from the form of y(v, X) that 

y(v, X) 
y(v, 0 ) 

is, for fixed X, a monotonically increasing function of v. This, by Neyman’s 
lemma, immediately proves Hsu’s result. 

4. The multiple correlation coefficient. We shall now apply our methods to a 
multivariate test. For typographic ease wo shall conduct the discussion for the 

transformation), is a function of --j-, except on a set of measure zero This statement 

would be completely trivial were it not for the exceptional sets; in any oaso it must bo well 
known to set theorists The author constructed an unnecessarily long proof of if, and 
believes that a more expeditious proof can be constructed using the ideas of [11, page 91, 
Theorem 11 1, and page 318, p. 7], Professor C. M. Stem of the University of California 
has informed the authoi that this result is a special case of one established by himself and 
G H Hunt in a forthcoming paper. For these reasons the proof is omitted. (See also [13, 
page 27, Lemma 9 1]) 



POWER OP THE CLASSICAL TESTS 


547 


case of three variates, but the reader will observe that the procedure is really 
perfectly general. 

The chance variables {F„}, i = 1, 2, 3, j = 1, ■ ■ , n, have the density 

function 

(4.1) g(B) = (2ir) <_3 " )/2 (| B |)" /2 exp (-* ± t hiV.m,) 

L i-i i,i-x J 

where 1) B ~ [6,i) is a positive definite (symmetric) 3X3 matrix, 2) y„ is the 
value assumed by F „. The null hypothesis Ho asserts that a given multiple 
correlation coefficient is zero, say that of Yi with F 2 and F 3 , i.e., 

(4.2) 5 12 — bn - ba — bn = 0 

The test is to be made on the level of significance a, i.e., if B 0 is any matrix 
which satisfies (4.2), and if G is a critical region for testing H 0 , then 

(4.3) P{G \B a ] = a 

where the symbol in the left member means the probability of G according 
to g{B 0 ). 

Write 

n 

ns„ = £ IMlbi 

fc -1 

S22 S23] 

f’ 

^32 S33 J 

Let M(cu , C) be the manifold in the 3n-space of 


y 11, ■ ■ ■ ? y*k , ■ ■ ■ j y^n 

where sn = Cn, S = C. First we prove the following. 

Theorem 6 . Any region G which satisfies (4.3) must have the property that the 
fraction of the area of M (c u , C) which lies m G is a, for any positive cn and any 
positive definite 2X2 matrix C = (We remind the reader that exceptional 

sets of measure zero are not precluded). 

Proop. Let ^(cn, C ) be the fraction of the area of M{c a , C) in G Recall 
equation ( 4 . 3 ) and the fact that Sn, S 22 , S 23 , S 33 are sufficient statistics for the 
elements of B 0 . On the manifold M(c n , C) the conditional density is uniform. 
Employing Wishart’s distribution [ 6 ] we conclude that 


(4.4) K> J *(sn ,S)\B a \N\S \^ 12 s 

• exp — 7 ^ {5n su -b bit s >2 -b 2Z>23 S 23 "b h 3 3 S 3 3 |J ds n dsw ds 23 ds 33 = oe 
where K' is a suitable constant which need not concern us Here the symbol 



548 


J. YVOLFOWITZ 


“as” means identically in bn , b n , bn , bn , provided only that bn > 0, b 22 > 0 
— bn 3 > 0. Of course 8 u is distributed independently of e», s n , s 3i . 
Proceeding as in section 2 , we can, by differentiation with respect to the b'n, 
obtain all the moments of the a, fa. Now let the b’a take any admissible constant 
values. The moments of the s,/s are then seen to satisfy the criterion of Cramer 
and Wold [7, Th. 2 ], and consequently essentially uniquely determine the 
distribution of the $u ■ The desired conclusion follows as before. 

The six parameters which uniquely determine the trivariate normal distribu¬ 
tion (of Fi, F*, Fa) with zero means may be taken to be the following: 

1 ) The co\mriance matrix {tr.y}, i, j » 2 , 3, of Fa and F a . 

2 ) The partial regression coefficients ft, ft, of Fi on F s and F a . These are 
defined as follows: Let E(Yi | F s = j/ 2 , F 3 = yf) denote the conditional expected 
value of Fi, given F 2 => y 2 , Y 3 = y 3 . Then 


J2(Yi I Fa => y,, Yi - Vi) = ftj/j + fty 3 . 

3) The conditional variance w a of F t , given Fa « yi , F 3 *» y 3 . 

The population multiple correlation coefficient R of F, with Fi and F a is then 
defined by 


RW 

(1 


dao'aa + 2ftft<7M + 


The six parameters above may be chosen arbitrarily, provided only that {< 7 , 7 } 
is positive definite. R and co are, by definition, non-negative. 

Let y { be the column vector y a , ■ ■ * , y m ; let y[ be its transpose, and let y 
denote the point y n , Hu, •*• , l/i», ita, * • • , y» n in 3n-space. Let z{y) = 
2(?q > Mi > Vi) be the component of y\ in the plane of yi and y 3 ; let r » | z(y) | and 8 
the angle between z and y %, measured positively say in the direction of y 3 . 
Finally let h be the absolute value of the vector yi — z{y\ , y%, pi). 

We intend now to investigate the set theoretic structure of tests whose power 
is a function only of R, and for this purpose prove the following: 

Theorem 7. Lei H be a region whose power is a function only of R. Lei 
V(h, r, 8, Sn , Sn , s 33 ) be the fraction of the "volume" of the manifold on which 
h, r, 8, Sn , s S3 , s 33 are fixed which is contained in H. With the usual exception of a 
set of measure zero, for fixed h, r, sn , Sn , s 33 , the quantity V above is constant for 
all 6, 

Later, after this theorem is proved, we shall write V without exhibiting 8. 
This procedure is justified by Theorom 7. 

Proof. Suppose the theorem false, and proceed as in Theorem 3. A suitable* 
rotation of the radius vector z(y) implies an orthogonal transformation T on the 
generic point y which leaves h, r, s 2! , s 23 , and s 33 unaltered, and takes the region H 
into a region H * such that H and H* differ on a set of positive m'easure. T leaves 
R invariant, hence leaves invariant R which uniquely determines the distribution 


4 See footnote 1. 



POWER OP THE CLASSICAL TESTS 


549 


of R Hence an argument almost the same as that which led us to (3.5) yields the 
conclusion that the power of H and the power of H* are equal, identically in B. 
Proceeding as in Theorem 3, we obtain two essentially different density functions 
in h, r, 9, s 22 , s 23 , s 33 , whose integrals over the entire space are identical in the 
elements of B. From these functions we obtain two different density functions in 
s,,(i, j = I; 2, 3), with identical moments (obtained by differentiation with 
respect to the elements of B). The rest of the proof is essentially no different 
from that of Theorem 3. 

Theorem 8 . In order that the power of H be a function of R alone, it is necessary 
and sufficient that, with the usual exception of a set of measure zero, V ( h, r, Sn, s 23 , s 33 ) 
be a function only of h/r (i.e., of R). 

The proof of this theorem is essentially the same as the proof of Theorem 4 
The place of the transformation Z is taken by one which consists of any linear 
transformation on the vectors yi and y% , the addition of a constant angle to 9 
(rotation of z(y)), and multiplication of the vector 2/1 by a positive scalar c. 
This transformation leaves R invariant. In the proof of sufficiency we use the 
distribution of R (see, for example, [10, p. 384, equation (15 55)]) The remainder 
of the proof is essentially the same as that of Theorem 4 

Theorem 9. Among all tests H which have the properties described in the conclu¬ 
sions of Theorems 6 and 7, the classical test based on R is the most powerful 

As a corollary to this theorem we have the following result due to Simaika 
[3]: Of all tests H whose power is a function of R only, the classical test based 
on R is the most powerful. 

Simaika’s result also follows easily from Theorem 8 and the density function 
of R in the same manner that Hsu’s result followed from Theorem 4 and the 
density function of v. 

In the course of the proof of Theorem 9, the various symbols W, with or 
without subscripts, will denote suitable functions of the variables exhibited, 
and the various symbols k, with or without subscripts, will denote suitable 
constants. 

We have that 

P(H\B\ = f B (2ir) (-a " ,/2 1 B |"' 2 exp j —* fj d V u 11 ‘ *V 

= J ( 2 V£ 0 2 ) <_ " )/2 exp fai - (ft 2/2 + ftya)) 2 J * 

(4 5)'^ r °( 822 > 828 ’ 533 ’ l^) ^Vn '' ‘ dya n = (2iro) ) ( ^exp<|--; 

exp [r/i + @lsn + 2 / 3 2 / 3 iS 2 3 + 0 aSw] 

■ IF 0(^22 j S 23 , Ss 3 , {<to}) dyu •• • dy% n . 
Now ( 022/2 + 032/ 3 )'2 is a function only of 0 2 , 0 3 , s 22 , s 23 , s 33 , r, and 6. Also 




550 


j. woLFowm 


/i s + =* fin = y\ , Thus 


P{H j B] = J V(h, r, Sss, Sm, fis 3 )lFi(/t, r, s^, Sgj, [71}) 
'{ 


exp I~ (fit ih + Pt y») j dO dh dr tls M ds n ds t , « j" V(h, r s a , s n , a M ) 


J 

(46) ’ TF »^’ r < Sss < * 25 , «M,|^))(4/tr)“ l expj~ (pay, + p t y 3 )'z 

• dO dh? dr 2 d« M ds M ds u = J V (Vyf- r\ r, s M , Sss, »w) 

' ^(Vyf—r*, r, s Bl Saa. «ib> (-®1 ) ex p|^ (PiU* ftys)'z| 

* dO dr 1 dj/i dfiia dssj ds M , 

Integrating with respect to 6 and designating 

W, J exp (p a y s + /9»2/j)'z| dO 

by IFCV^T 5 , %, «ia, saa, {/-?}) we observe that just as in (2.4), W is 
monotonically increasing in r (all other variables fixed). Thus we have 

(4.7) P[II\B) - f VW dr’ dy\ ds n ds M d» M . 


In constructing # only the function V is at our disposal, and this subject to the 
limitations imposed by the conclusions of Theorems 6 and 7 and the fact that 
h? + r a ~ yl = Sii. The function IF is not within our control at all. With y\, 
sn j s 23 , fiaa fixed, IF is monotonically increasing with r. To maximize the power 
it is therefore best to distribute the "mass” so that V is as large as possible for 
large values of r and hence of R. Tins implies the classical test and proves the 
theorem. 


6. Stringency of the classical tests. Wald [ 8 l calls a test Ti “most stringent” 
if the following is true: Let (Tj be the totality of tests. Let 6 be the generic 
point in the parameter space, and P[T | 0} be the power of T at the point 0. 
Let T 2 be any test other than Ti . Then 

sup [sup P[T \0] - P{7\|0)] £ sup [sup P[T\6\ — P{T,\ 0\}. 

8 (r) 8 |rj 

Of course, we have omitted to specify the totality (T). One can admit all tests 
whose size ^ a, a given constant between 0 and 1 , or restrict one’s self to tests 
whose size is exactly a. We shall do the latter. 

Under these circumstances we shall prove that the classical test of a linear 
hypothesis is most stringent. Our proof will occupy but a few lines, and is an easy 



POWER OP THE CLASSICAL TESTS 


551 


consequence of the structure of the classical tests as described in the lemma of 
section 2 The result itself is a special case of an unpublished theorem due to 
G. H. Hunt and C M Stem, and all priority on this result is theirs 
Return then to the notation of section 2. Let a be fixed at any arbitraiy 
positive value, and the surface 


be that one on which 


= sup P{T\ri\ - 

m 

is a maximum, where Li is the classical test of the linear hypothesis. It is clear 
that this maximum is actually achieved, and that «i(ij) is a constant on the 
surface y = cl. Let L 2 be any other test (of size a), and w 2 (ij) be the corre¬ 
sponding function for L 2 . We have only to show that on the surface y = c2 
we cannot have everywhere u 2 (rt) < and our proof is complete. If everywhere 
on the surface y = co we had m 2 (i?) < oji(tj), we would have, also on the same 
surface, P[L 2 1 ij] > P{Li\n}- This would, however, violate Wald’s Theorem 2 
(section 2) and proves the desired result 

REFERENCES 

[1] P L Hsu, “Notes on Hotelling's generalized T,” Annals of Malh Stat., Vol. 9 (1938) 

p. 231. 

[2] P. L Hau, “Analysis of variance from the power function standpoint,” Biometnka, 

Vol 32 (1941), p. 62 

[3] J B Simaika, “On an optimum property of two important statistical tests,” Bio - 

melrika, Vol 32 (1941), p 70. 

[4] A Wald, "On the power function of the analysis of vaiiance test,” Annals of Math, 

Stat., Vol. 13 (1942), p 434. 

[6] J A Shohat and J D Tamabkin, The Problem of Moments, The American Mathe¬ 
matical Society, New York, 1943. 

[6] John Wxshabt, “The generalized product moment distribution, etc,” Biometnka, 

Vol 20A (1928), p 32. 

[7] H. Gram&u and H. Wold, “Some theorems on distribution functions,” Land Malh. 

Soc. Jour-, Vol. 11 (1936) 

[8] A Wald, “Tests of statistical hypotheses concerning several parameters when the 

number of observations is large,” Am. Math Soc. Trans , Vol 54 (1943), p 426 

[9] P. C Tang, “The power function of the analysis of variance etc ,” Slat. Res, Memoirs, 

Vol. 2 (1938) (University of London), p. 126. 

[10] M G. Kendall, The Advanced Theory of Statistics, Vol 1, Charles Griffin and Com¬ 

pany, London, 1945, 

[11] S Saks, Theory of the Integral, (Second Edition), G E Stechert and Company, New 

York, 1937 

[12] J, Neyman and E. S Pearson, “On the problem of the most efficient tests of statistical 

hypotheses, 1 ’ Roy. Soc. London Phil. Trans , Ser A, Vol. 231 (1933), pp. 289-337. 

[13] Ebehiiard Hope, Ergodentheone, Chelsea, New Yo^'Wte^ , |^-“““ -— 



APPLICATION OF THE METHOD OF MIXTURES TO QUADRATIC 
FORMS IN NORMAL VARIATES 

By Herbert Robbins and E. J. G. Pitman 
Institute of Statistics, University of North Carolina 

1 . Summary, The method of mixtures, explained in Section 2, is applied to 
derive the distribution functions of a positive quadratic form in normal variates 
and of the ratio of two independent forma of this type. 


2, The method of mixtures, If 

(1) F,(x), F\(x), 

is any sequence of distribution functions, and if 

(2) Co, c y , • • ■ 
is any sequence of constants such that 

(3) c s > 0 O'= 0,1,..-), 2c, »1 

(all summations will be from Oto « unless otherwise noted), then the function 

(4) F{x) — 2cj Ffx) 
is called a mixture of the sequence (1), 

It is sometimes helpful to interpret F(x ) in the following manner. Let J, Xa , 
.Xi, ■ • ■ be variates such that J has the distribution P[J « j] => cj (j « 0,1, • > •) 
and such that X] has the distribution function Fj{x). Let AT be a variate such 
that the conditional distribution function of X given J = j is E/x). Then the 
distribution function of X is 

P[X <x] = SP[J = j]-P[X < x | / = j] - Scj Fj(x) = F(x). 

This interpretation of F(x) will, however, not be involved in the present paper. 

The following statements are proved in [1], If x = (*i, • • • > x») is a vector 
variable the function E(x) defined by (4) is a distribution function, and for 
any Borel set S, 

(6) f dF(x) = Icj [ dFj(x). 

Jb Js 


More generally, if g{x) is any Borel measurable function then 


[ g{x) dF{x ) = Icj [ g(x) dF s (x ) 

J- SO 


whenever the left hand side of (6) exists. In particular, the characteristic function 

562 



METHOD OP MIXTURES 


553 


<p(t) corresponding to F(x) is 


(7) <p(t) = 

where tp,{t) is the characteristic function corresponding to F 3 (x). 

If each F,(x) has a derivative /, (x) then F(x) has a derivative f(x) given by 

(8) fix) = 2c, fj(x), 


provided that this series converges uniformly in some interval including x. 
Conversely, if (8) is the relation between the frequency functions and if the 
senes is uniformly convergent in every finite interval, then the relation between 
the distribution functions is given by (4). In practice we deduce (4) from (8), or, 
using the uniqueness theorem for characteristic functions, from (7). 

As regards computation, we observe that for any integers 0 < pi < p 2 and 
for any x it follows from (3) and (4) that 


PI—X 


(9) 


0 < F{x) - zlcjFjix) = 2 CjFjix) + X) c,F,ix) 

0 P2+1 


Pi 


( PI-1 \ / Pl-1 PI \ PI 

X) Cj) + sup {F,(a:)Ml - X)c, - X)cj) < 1 - X) <V 

a / i> pt \ o pi / pi 


The existence of these upper bounds (the last a uniform one) for the error term 
when the series (4) is replaced by a finite sum shows that series expansions of the 
mixture type (4) are especially well adapted to computational work. 

For some purposes it is useful to consider series expansions of the type (4) 
where the c, may be of both signs and where the aeries 2c, may diverge. Both 
parts of (3) will, however, be satisfied in the cases considered here. 

If U, V are independent variates with respective distribution functions 
Fix), G{x) we shall denote the distribution function of any Borel measurable 
function H(U, 7) by 


HiU, V) {F{x), G (*)). 


Now if F(x), G(x) are both mixtures, 

F{x) = 2 Cj F,{x), G{x) = 24 G k (x), 

then by (5), 

P[H(U, V) < x] = JJ dFiu) dGiv) 


= 22c, d k Jf dFj(u) dG t iv), 

[H ^!C] 


so that 

(10) HiU, V)i^c j F ] {x), 2 dkGkix)) = 22cy 4 H(u, «)(F,(x), G k (x)). 



554 


HERBERT ROBBINS AND E. J. Q, PITMAN 


As an application of the principles set forth in this section we shall express 
as series of the mixture type (4) the distribution functions of any positive 
quadratic form in normal variates and of the ratio of any two independent forms 
of this type., Special cases of the problem have been dealt with by Tang [2], 
Hsu [3], and many others, but the method of mixtures permits a unified and 
simple treatment of the general case. 


3. Distribution of a positive quadratic form. We shall denote by F» (x) the 
chi-square distribution function with n > 0 degrees of freedom, 

(11) K(x) = [u^-e^-du (x > 0), 

= 0 (x < 0) 

The corresponding characteristic function is 

(12) <p n (t) = dF n (x) = (1 ~ 2- to 1 ", 

Jo 


where we have set w => (1 — We shall denote by xl any variate with 

the distribution function (11). 

Let o be any constant such that a > 0, The characteristic function of the 
variate o-x* is 


(13) (1 -2mr‘ n = Ml - 2il) - (a - 1)]"‘" - a' 1 "-to 1 " 
By the binomial theorem we have for any a > 0, 



(14) 


1 “(l - 0* ' n - 2c,V 



where 


(15) Cj = a 


,-in \n(\n + 1) • - • (in + j - 1) 


i 1 - ;)' 


0‘- 0, l. ...). 


For a > 1 we see from (15) that all the c, are non-negative. Likewise for a > § 
(and hence d fortiori for a > 1) we have 11 — 1/a | -J > 1 so that (14) holds 
for all | z | < 1; setting z = 1 it follows that the sum of all the Cy is equal to 1. 
Hence for a > 1, 


cy >0 (j = 0, 1, • • *)> Scy = 1. 


Since | w | = | 1 — 2il | 1 < 1 for all real t it follows from (13) and (14) that 
for a > 1, 

(1 - 2iat)~ in = 2c, = 2c,(1 - 2itr in ~’ 


( 16 ) 


— 2 Cy (js>„42,({). 



METHOD OF MIXTURES 


555 


Hence for a > 1 the distribution function F n (x/a) of the variate a ■ xl. is a mixture 
of x distribution functions, 

(17) F n (x/a) = Sc, F n+ i,(x), 

where the c,, determined by the identity (14), are the probabilities of a negative 
binomial distribution. 

It may, m fact be proved by a direct analysis, which we omit here, that (17) 
holds for any a > 0. However, if a < 1 then the c, will be of alternating sign, 
and if a < \ then the series 2c, will diverge This shows incidentally that a 
relation of the form (4) can hold even though the series 2c, diverges and hence 
the corresponding relation (7) does not hold for t = 0. 

Theorem 1. Let 

X = o(Xm + Ol Xmi + ' • • + Xm,)> 


where the chi-square variates are independent and a, a i, • • • , a, are positive constants 
such that 


a l > 1 

Define constants Cj by the identity 1 

then obviously 


i-b»i 


(» = !>••■> *•)■ 
= Sc, 2 J (| z I < 1); 


Let 


c, > 0 O' = 0, 1, • • •)> 2c, = 1. 

M = m + mi + • • • + m r ; 


then for every x, 

(19) P[X < x] = Xcj-Fu+ifix/a). 

For any integers 0 < Pi < p 2 and every x, 

0 < P[X < x] — Y Cj F.m+ 2 ] (x/ a) 

pi 

/pi-l \ / PI" 1 « \ 

(20) < Fm(x/o) • ( Y Fjt+i pi +i (x/o) -ll — °i ~Y c iJ 

P2 

< i — Y - 

pi 

Proof. The characteristic function of X/a is, by (13) and (18), 
ip(t) = w iM ■ ~ - } = 2c ’ wiM+1 = Zcivm+iM 


1 If r — 0 we regard the left hand side of (18) as having the value 1. 



556 


HERBERT ROBBINS AND E. J. G. PITMAN 


Hence for any y, 

P[X/a < y] ® Sc, FM+ij(y), 

whence (19) follows on setting x <* ay. Finally, since F(x) is a decreasing function 
of n for fixed x, (20) follows from (9). 

It should be‘ observed that the coefficients cy determined by (18) can be 
written explicitly as the multiple Cauchy products 

oj “ E ift.vft.i,}, 

II +---+ir~i 

where 


c,,y = a 


-lm< + 1 ) 


• • • (fra, + J - 1) , / __ 1 V 

3 1 V a<) 


(» *" li ' ** i r-,j « 0, 1, ■ * -)* 


The cy may be computed stepwise by the relations 


„<n _ - 

c i “ Cj.y, 


* £ {c}!?’ • ft,,} 


<-0 


( s 2, ■ • • , r), 


c) r> » cy. 


4. Distribution of a ratio. The ratio x*/xn of two independent chi-square 
variates has the distribution function 


(21) F m , n (x) - r(lm)T(bn) ^ + ^ - °)» 

» 0 (x < 0), 

In computational work we can use the tables of the Beta distribution function 

AM - j&tjS f • (1 - li)-' -du (0 < x < 1), 


together with the identity 


T(r)-r(s) Jo 
0 (* < 0 ), 1 (* > 1 ), 

^ m,n (») - f»/u+.)(lw, |n). 


Theorem 2. Let 

( 22 ) 

where the x varieties are independent and a, a X) , a,., b y ,•••, b, are positive 


X = + OlXmi + ’ ■ • + Clr^Lr) 

Xn + fclXn! + “ * + b.Xn, ’ 



METHOD OP MIXTURES 


557 


constants such that 


a, > 1, by > 1 


(* = 1 . •*• ,r\j = 1 , ••• , *). 


Define constants cy, dk by the identities 

SM 1 -(>-*)•! 


-im, 


= 2c, z 1 , 

= 2 d k z ; 


(|*| <1) 


then 


Let 


Cy > 0, 2cy =1, d k > 0, 2 d k = 1. 


M = m + mi + ■ • • + rn T , N = n + ni +■■••+ n, ; 


then for every x, 

P[X < x] = 22c, dk ■ FM+i],N+ik(x/a), 
and for any integers 0 < pi < p 2 ,0 < gi < qi and every x, 

P 2 Q2 

0 < P[X < w] — 2 c j dk ‘ L m+ 2 j ,n+ik{x/a) 

s -t«). 

Proof. Let £/, V denote respectively numerator and denominator of (22). 
From Theorem 1, 

P[f/ < x] = 2cy P« +2 ,(x/a), 

P[7 ^ x] = 2 dkF w+w(®). 

Hence by (10), for every x, 

P[X < x] = P[U/V < ®] = 22cy dk-FM+ 2 j,N+ih(z/u)- 
The rest of the theorem is obvious. 

Corollary. Let 

2 

V X A/ 

A 2 , i Z » 

axr + i>Xt 

where the x 2 variates are independent and 

0 < a < b. 



558 


HERBERT ROBBINS AND E. J. U, PITMAN 


Define 


a “ a/b , N « r -j- «, 

C y = a »‘ . . (1 - a) 1 (j - 0, 1, ...); 


then 


d > 0 (j «“ 0,1, * * •)» 2c/ 1, 

and for every x, 

P[X < xj = Scy Pk,w+si(o^)' 

Per any integers 0 < pi < pi and every x, 

0 < p[A' > x] - £ c>U - Pv.y+s/(aT)] 


(23) 


< [1 - Pv.v(«U')) 


(?'■■) 


+ (1 — /'V.AMapj+sOwO] 

\ o P I / PI 

Proof. Except for (23) this is a special ease of Theorem 2. To prove (23) 
we observe that 


P[X > x] = 1 - P[A r < x] - Scj[l - Fv.n +1 ,(«x)], 

and since for fixed m and x, F m ,„(x) is an increasing function of n, (23) follows 
in the same way as (9). 

6. The non-central case. Let 7 be normal (0, 1) and let A' ■*> (F + d) t , where d 
is any constant. The frequency function of X is, for x > 0, 

f(x) = (2^r > ’e H(t,1+l) .(c dl ‘ + e^)/2, 

By expanding the last factor into a power series it iB easily seen that 

(24) /(*) — 2p/-/i + 2j(x), 

where / n (x) = F' n (x) is the chi-square frequency function with n degrees of 
freedom and where 

Pi “ c~ u ’ • iWy/j ! (j m 0 , 1 , • • *)* 

Since the identity 

(25) = Spy *' (all z) 

holds, it follows that 

py > 0 O' = 0,1, •• ■), Spy = 1, 



METHOD OF MIXTURES 


559 


The series (24) is uniformly convergent in every finite interval, so that we 
can wnte the distribution function F(x) and characteristic function <p(t) of X 
in the forms 


F{x) = Sp 3 -Fi +23 (a;), 

v(f) = 2 p,-Pw,® - 

where again we have set w = (1 — 2it)~ 1 . 

Now let 7 a , • • ■ , Y n be independent and normal (0, 1) variates and let 

(26) X = (Yi + di) 2 + • • * + (F„ + dnf, 
where the d{ are constants such that 

dl + • ■ • + di = d\ 

The characteristic function of X is then 

<p(t) = W in e~ idHl ~ W) = 2p 3 W U+1 = Xp, <pn+ 2 }{t), 

and hence the distribution function F(x) of X is again a mixture of x distribution 
functions, 

(27) F{x) = Sp 3 .X n+23 (x), 

where the p 3 , determined by the identity (25), are the probabilities of a Poisson 
distribution with parameter X = \cf. We shall denote the non-central chi-square 
variate (26) by xla • 

We can now generalize Theorems 1 and 2 in a straightforward manner to 
cover non-central chi-square variates. We shall state only the generalization 
of the Corollary of Theorem 2 to the case in which the numerator is non-central. 


Theorem 3. Let 


X = 


n 

XM,d 


axl + bxt 


where the x 2 variates are independent and 


Define 


C/u 


0 < a < b. 


X = 2 d 2 , a = a/b, N — r + s 
V, = e _X 'X y /j! 

= ■ . |s(fs + 1) ■ • • (%S + fc - 1) ^ 

k\ 


(J = 0 , 1 , •••), 

(k = 0 , 1 , 


then 

Pi > 0, 2 p, = 1, c* > 0, 2 a = 1, 



HERBERT ROBBINS AND E. J. Q. PITMAN 


560 

and for every x, 

PpiT <J t] = 2Spy Cjt FM4 ij.rt+zk(&x). 

For any integers 0 < gi < 0 i, 0 < h < h , 

0 < P[X < *] - ± t PjCt ■ F u+ i,,„ + »(ax) < (l - t,p) • (l - 

Oi hi \ Pi / \ Ai / 

REFERENCES 

[1] Herbert Robbins, "Mixture of distributions,” /instils of Math. Stalintirti, Vol. 19 

(1948), p, 360 

[2] P. C. Tano, “The power function of the analysis of variance tests with tables and 

illustrations of their use,” Slat. Res. Mem., Vol. 2 (1938), p. 126. 

[3] P, L. Hsu, “Contributions to the theory of‘Student’s* t-teat as applied to the problem 

of two samples,” ibid., p. 1. 



THE JOINT DISTRIBUTION OF SERIAL CORRELATION 
COEFFICIENTS 

By M. H. Quenotjille 
Rothamsted Experimental Station 

1. Summary. An expression for the joint distribution of serial correlation 
coefficients, circularly defined, has been derived. It has been shown that this 
distribution possesses properties similar to those already encountered in the 
distribution of a single serial correlation coefficient, i.e. it is definedby different, 
function forms for various subregions. The distribution thus found is of little 
use for computational purposes Consequently, approximate forms have been 
investigated and the suitability of the ordinary partial correlation coefficient 
for large-sample testing has been inferred. 

2. Introduction. Anderson [1] has derived the distribution of the serial 
correlation coefficient 



where the are normally and independently distributed with mean ** and 
variance a 1 and where a circular definition is employed, so that «„+, is defined 
to be equal to e,. However, in making a test of any series, we shall usually be 
faced with a set of serial correlation coefficients, so that we shall require a joint 
distribution function of ri, , • • • , r m say: This distribution function is derived 

below by an extension of the method used by Koopmans [2] 

It should be noted that Bartlett [3] has shown that for large samples the 
variances and covariances of the n are independent of the distribution of i, 
under fairly wide conditions This means that the joint distribution function 
obtained for normal e, will often give a good approximation for non-normal e, 
and can be used as the basis for any test of the correlogram. 

3. Conditions on the rj. It is easily seen that the n cannot take all values 
from -)-l to —1 independently. For example, 7*2 cannot take a value near —1 
if n takes a value near -f-1. As a result, there will be certain necessary conditions 
that the n will have to fulfil. It is not difficult to find these conditions, since, if 
yi{i — 1,2, ■ • • , n) are any set of variables, then 

(i) X («.+>• y *) 2 = (X t j yi yi+j . 

where «, may or may not be corrected for the mean and the double-suffix sum¬ 
mation convention is employed. 


661 



562 


M. H. QUENOUII/LE 


Thus, provided 0 < m < n/2, we will have 



Fio. l 


as a necessary condition that the right-hand side of (1) be positive definite and 
this expression will impose necessary conditions upon the joint distribution 
of the f(. 

Fig. 1 gives the limits of possible values of ri and r$ subject to (a) no restriction, 
(b) n = 0, (c) n = u = 0. 


4. Complex Integration in m Variables. Before finding the joint distribution 
function of the r, some introductory remarks on complex integration involving m 
variables will be necessary. 







SERIAL CORRELATION COEFFICIENTS 


563 


We can evaluate an integral such as 



where 3(a t ) = 0 and /(z x , z 2 , • • • , z m ) is regular in the region £f(z,) > 0, by 
successive Cauchy integrations, so the integral has a value ( 2iri) n f(ai , ■ ■ ■ , a m ). 
In the same manner as for Cauchy integration, it will be possible to distort the 
contours over which we integrate so that we can evaluate 

. /(zi ■ • • g m ) 

J S “J fib - d21 dga ’ 

provided that /(z x , • ■ • , z m ) is regular in the region defined by *S', and (a x , ■ • ,a m ) 
is enclosed in this region 

More generally, if we have an integral of the form 



and we make the transformations w, - a, 3 z, and b, = ci 13 c,, i.e. W = AZ, 

C = A~ l B , it is possible, in the above manner, to evaluate the integral as 

(3) 

Suppose we now consider the integral 

f... [ 

8 II - &>) 

where n, > m. We may select a set, (jk , of m equations = f> 3 , and let 
A = [o,J, 5, = [5j], C k = A1 l B k = [c, h ]. Then, we may carry out the integration 
as previously, in this case, summing a series of terms for various combinations 
of m equations out of the possible n. The value of the integral may then be 
written 

< 4 > <2 ” ) n («»«.-w 

l^ak 

where the summation occurs over the points (c.t, <h k , • • • , c nk ) lying m the 
region defined by S, and the product term excludes the set of equations g k . the 
ambiguity of sign in (3) and (4) arises from the Jacobian \A> | , and the sign 
must be chosen which makes the transformation of dzi, • • ■ , dz m yield a positive 



M, H. QUENOUILMB 


element. It must be noted that it is possible to obtain several expansions of the 
form (4) according to the convention that is employed in defining “enclosure” 
for each of the variables. 

6 . Integral form for the joint distribution function. We can, without loss of 
generality, assume a =» 1. Suppose that 

P - £ «< - ^2 j n, Qi “ 2 *<*<+1 — (j2 J n, 

where ,«*,•••,«» are independent, so that n ~ qi/p. Then by a consideration 
of n dimensional apace, we can see that p is distributed independently of ri, • • • r m 
so that their joint distribution can be written g(p)h(r i, • ■ ■ , r„)dp dn , • • ■ , dr m , 
The joint distribution of p and , * • * > ?* can thus be written 


dp dq x • • ■ dr]„ 


(5) /(pc/i ••■<?„) dp dqi • • ■ dq m = h (~ , ■ • • dp dqi • • ■ dq m , 

V \P P / 

where it is not difficult to see that 

mlO*—!),#—Jp 

(6) g{p) “ ------ . 

2 >(B ~ 1) r 

We can now find the joint distribution of p and qi, ••', 5 m by inverting the 
characteristic function of these variables. This is given by 

{2^ L-I exp [~T + + dtl ’'' dtn ’ 


where 


1/1 A l‘, 


“ [ft , «», • • • , «„] 


a—.1 

IA | = Xjf Cl — 2i v — 2 idjKji), 

“ (1 “ 2 n (1 “ 

l-l 


2iry i 

Kj I « COS —A 
TV 

2i6 f 

** “ 1 - 2 i v ’ 


so that the joint distribution of p and , • • • , q M is 

i r r i 

z(p, ft* "ffJ -* 7 2 - j w+ i “■ j mi ex P {-*(«» + OjQi) } dn d$i • • ■ d0„ 




(2r) m+l 


exp < - 


Js, fTji 

(l — 2v'rj) q } \ dx 1 - • • 

2 J (2v) m 



SERIAL CORRELATION COEFFICIENTS 


565 


where S „ is the region bounded by k, — ± 




. Now <S, can be replaced 


1 - 2irj ' ’ 

by region S enclosing the same set of singularities on the real hyperplane, and £ 
can be chosen independent of ij. Thus it will be possible to reverse the order of 

r « 

integration in (7) provided that / 11 — 2ir] converges, i.e. provided 

oo 

n > 2m + 3. Then since 


^ £ (! - 2»ij) 1(n 2m 15 exp {-i v (p - Kj qi )) dr\ 

(p — K ? g,)* {ri “ 2,n-8) 


(n—2m — 1) -p / 71/ — 2 7fl — l\ 

2 r v—2—) 


exp l - Up - K) Qi )} for p > k, q 3 , 


= 0 for p < K,q,, 

we get 

f(p, 2i ■ ■ ■ 9m) = 


p~ip 


( 8 ) 


;‘<""( 2 *;rr (” ~ 2 ” ~ 


/.■• im 


(p - 


t »-i 

n (i - 


l * ■ ■ dx ,„, 


where iS encloses the same singularities as S,, all of which lie in the region 
p > Kj q, If we now use (5) and (6) we get 



In a similar manner, it is possible to derive for n > 2m + 3 the joint distribution 
of serial correlation coefficients, f*, • • • , f m , uncorrected for the mean, in 
the form 


( 10 ) 


h(jl ■ • fm) = 



(1 - K,-f 1 )» ln - 2m - 2 > 

- n "I* 

II (1 - K : Kj l) 

. 1-1 J 


(IK\ * (Ik j/t 


6. Extension for variables in an autoregressive scheme. Madow [4] has shown 
how to extend the distribution of the serial correlation coefficient for uncorrelated 
variables to the case when the variables x { are connected by a linear Markoff 
scheme, x, = px,_i + e, with a normal distribution of the error «, It is worth 



566 


M. H. QUBNOUTLLE 


noting that the method used by Madow can be applied to derive the joint 
distribution of serial correlations of variables a, which are connected by a linear 
autoregressive scheme of order m, or less, 

<kz, + + • * ’ + =• 

where ei, ••*,*„ are normally and independently distributed, and e„ + , = «, , l 
Under these conditions, the expression (9) will be modified by a factor 

J(n-l) 

I - 1 

I (A + 2’ 

where 

A = £al, 

k-Q 
tn—/ 

H a k a k +j, 

k-0 

while (10) will be modified by a similar factor with'n replacing n — 1. 



7. Reduction of the distribution function integral. Using the method described 
in section 4, it is now possible to reduce the integral given in (9), if we observe 
that Kyi = and assume n odd. We then have 


h(r v ■ -r m ) 



(1 — Kyry)K"- !!m - s > 

II (1 ~ 

/-I 


dn i■ • • d< m 


(12) 

r (V) 

_V 


i i 

r K k 

»(- 


^ (n — 2m — 




1 

I 


V 2 

J 

n 

iih k 

K it 

K k 


where / = (1, 1, • ■ , 1), r' = (n , r 2 , • • • , r m ), (c yi = (k { , , • • • , * wl ) and 
is the matrix formed from a set g k of the m matrices «y ( arranged in order. 
The factors in the summation can most easily bo determined if we put 
1 i I 

r K k * r ' ” > r «—i) — an d sum over the region for which r m < 


d.(n , • ■ ■ , r m _i). To demonstrate the manner in which formula (12) works, wo 
shall consider m = 2, From formula (2) we can see that a limit to the possiblo 
values that r 2 can take is given by r 2 = 2r“ — 1 i.e. by the curve (cos 0, cos 20) 


™ Xl . 


1 This is a sufficient condition for x„+i 



SERIAL CORRELATION COEFFICIENTS 


567 


in the (n , r 2 ) plane. It is not difficult to see that there are "C 2 possible terms in 
(12) and that each of these terms is proportional to the \(n — 2m — 3)th power 
of the distance from a line in the (r x , r 2 ) planes. These lines are the joins of the 
points (cos 2 m/n, cos 4 m/n), i = 1, • • • , \{n — 1) and the joins of such points 
on the curve (cos d, cos 20) give the outer limits of the possible values of n and r 2 . 

It can also be seen that these points correspond to the equations k,k } i = 1 (each 
of these equations determines a plane in 4-dimensional complex space), while 
the joins of these points correspond to the singularities defined by and terms 
arising from pairs of these equations. Furthermore, since the sum of residues m 
any plane is zero, the sum of contributions, taken with appropriate signs, arismg 
from lines through any of these points is zero, i.e. the sum of all possible terms 
involving any particular k,i will disappear. This leads to several possible 
expansions for hfa , • • • , r m ). 

If we consider the particular case n = 9, then each term in the expansion (12) 
is proportional to the distance from one of the lines joinin g (cos 2m/9 cos 4m/9), 
i = 1, 2, 3, 4 These lines may be denoted by l tl . Then the contribution from 
l x s is given by 

g_ KuKij — («n + + 2 (fa + 1) 

(Kl, — K U )(ku — Klk)(Ku — Kl*)(«l; — *»)(*« - Kli)' 

where j > i and n ia = cos . 

3 

The values of this expression are: 

In , - 1.979 + 2.938 n - 1.563 r 2 , 
lu , 0.926 - 2.106 n + 3 959 r 2 , 

lu , 1 053 - 0.832 n - 2 396 r 2 , 

In , - 5.012 - 3.959 n - 6.065 r 2 , 
hi, 3,033 + 6 897 n + 4.502 r 2 , 

lu , - 4.086 - 6.065 r x - 2 106 r 2 , 

where, for example, the contribution from In acts in the region for which 
1.563 ri < — 1 979 + 2.938 n . Fig. 2 demonstrates the configuration for this 
case It is seen that the frequency surface is a tetrahedron. As particular ex¬ 
amples of the identities mentioned above we have 

111 + ^13 + hi = 0, 

— In + ^23 4" hi = 0, 

— lu — hi + hi ~ 0. 

For a general value of m, we shall find that the hyperplanes joining sets of m 
points (cos 2 ri/n, cos 4m/n, • ■ ■ , cos 2irmi/n) will be singularities on the 



568 


M. H, QUENfOUXLtiE 


frequency hypersurface, The hyperplanes passing through sets of m successive 
points will give the limits of possible valueB of n , ■»■ ,r„, Furthermore, the 
sum of contributions (with appropriate signs) to the frequency function from 
the sot of $(n - 2m 1) hyperplanea passing through any point will be zero, 

8 . Integral approximation for the distribution function. Tire expression (12) 
is, of course, difficult to use in practice and we require an approximation Bimilar 
to that of Ko op mans. For this wo make use of the integral expression (10) 


ra 



hr the joint distribution function of f,, ■•*,?„ and approximate to the factor 

II (1 — V/Oj * This can be done without undue difficulty, but the resulting 

multiple integral does not appear to be capable of easy reduction. This is hardly 
surprising, since from the nature of the distribution of the r,- we should expect 
this approximation to involve R m raised to a suitable power, and this conjecture 
is strengthened by the following considerations: 

a) The distribution of fi may be obtained by considering the two sets of 
observations **,**,>••, s n -i, x n and , x 3 , • • • ,x n ,xi as unrelated, and using 





SERIAL CORRELATION COEFFICIENTS 


569 


the distribution of the ordinary correlation coefficient corresponding to n + 3 
pairs of observations. (Dixon [6] Quenouille [7]). In the same manner, the m sets 

of observations CCl , , * ‘ > "Tn—1 , Xnj , ^*3 , , IT ft , * Xtn , ZCm-l-l j ‘ ' ) 

x m _i , Xm-i , might be considered as unrelated and the joint distribution of their 
correlations, given by Garding (5), will involve R m raised to a suitable power. 

b) The outer limits for the joint distribution of ri, rj, ■ ■ , r m or n, f 2 , ■ ■ , f m 

for large n, will be provided by the equations R P = 0, (p = 1, • , m). An 
investigation of the properties of the functions, Ri,Rt, •• , R m might therefore 
be expected to throw light upon the joint distribution of n , r 2 , • • , r m or 

f\ , fi , • ' * j f f/ 

c) Rp is a quadratic in r„ and may be put equal to R p -i(t p - r p )(r p — r v ), 
where r' P and r" p are functions of r x , n , • • •, i giving the limits of the values 
that r„ can take for any particular values of n , ■ • • , r^-i. Let Q v = R P /R P - 1 , 
then Q v is likewise a quadratic in r p , taking all values between r p and r p and 



r„)*(r p - r p Y dr P 


Bjs + 1, j) ( r P - r'; V ,+1 
QU \ 2 ) ' 


But, by expanding R p as a bordered determinant, it is not difficult to show that 
r' p - r" v = 2Q P -i, so that 



r(s + l) 
r(s +1) 


Q 


*+i 

p-i 


In particular, if 

r (hn + 1) • • -T{\n - m + 2) _ 0 t(7>-2 m +j) 

(13) /(7V-r m ) - r (i 7l + i)- • -rdn - m +T) * mn 

and if we integrate with respect to r m , r m _i, • ■ r 2 m turn, we get 

r r ft _ r(^7i. + 1) _ 2\i(n—1) 

/ J fin * • • r m ) dr n ■ ■ ■ dr* - r( i n + jy 
which is the approximate distribution of the first serial correlation coefficient, 

uncorrected for the mean, as given by Dixon [6]. 

The importance of this lies in the fact that the integral corresponding to that 
of Koopman’s for the joint distribution is 


Tihn) 

r($n - m) 





11 ^ li 71 " - ™ -1 


0 

L- 

\ m 

HrXl n 

J | Y |»- 

sm \nxi 

“W y 

dxi a J 


dx i f ■ • dx m 



570 


M. II. QUENOUILLE 


where r' = [n , ■ ■ • , r m ], 


X = 


coa Xi cos Xi • • • cos x m 
cos 2ft cos 2 x 2 • • • cos 2x„ 


L cos mx i cos mxi 

I = [1,1, 1], 


cos mx n 


Y = 


1 

COS Xi 


1 

COS ft 


1 

COS Xr, 


L cos (m — l)ft cos (m — l)ft 
K '( 0 ) s= [cos 0, cos 26, ■ ■ ■ , cos md], 

and S is the region given by- 


cos (m — l)ft„ J 


1 I 
T X 


> 0. This suggests, by analogy, that the 


joint distribution function is a polynomial m r n of degree 2(.)n — in — 1) -)- 3 = 
n — 2m + 1 which vanishes only when R m = 0. The equation satisfies these 
conditions, and in addition, it reduces to the known form when m — 1 and can 
be integrated to give this same form. Thus there is a strong suggestion that (13) 
gives an approximate distribution of ft , ft, < • ■ , r m , uncorrected for the mean. 

An alternative form for the constant factor in (13) may be obtained if we 
note that 


r(|n — m + 2) 


r(n — 2m + 3) 


T(^n - m + f)ir> 2 n ~ lm+1 [l’(irr - m + $)]** 

d) Now r'j, and r" v can be written in the forms (£p_i + Rp-i)/R p -2 and 
(Sp-\ — Rp-i)/Rp-i , where 


Thus 



ri 

ft 

ft 

... o 


1 

ft 

ft 

• • • rp_, 

= (-1) 1 "- 1 

ft 

1 

ft 

• • • Tf -2 


? p—2 

rp- a 

T p —4 

ft 

, (Sp-i + Rp-i 

-A 

( r . 

S P -i - 1 

\ K 

-2 

V 

v* 


'p-i Fi ( r p Rp-i 

-Sp 

rlYl 


V. L \ 

R. 

P-1 

)\ 



where 


Q P — Q ? -i(l — ript-i 23. .) 

ft,p+I.23 . = Tp-l/Rp-l, 






SERIAL CORRELATION COEFFICIENTS 


571 


and 


T v h h ■ • r p _i 

Vi 1 h ... V2 

!Tp_i = r^j ri 1 f p _ 8 


D r p _ 2 r P _ 3 ... 1 


Therefore, if we make a change of variable to r hr . ul ., r liPi23 ,.,.., fu 2 , n) 
we find that the new variables which correspond exactly to partial correlation 
coefficients are, in fact, independently distributed as such, with 3 degrees of 
freedom more than in the case where the sets of variables are distinct observa¬ 
tions. 


While the above properties do not prove that the r, or f, may be tested 
using partial or multiple correlation coefficients, this conjecture has been verified 
elsewhere and it has been shown [8] that, with certain adjustments, a test can be 
derived which is applicable to fairly short series. 


REFERENCES 

[1] It R. Andehron, "Distribution of the Benal correlation coefficient,” Annals of Math 
flint., Vol. 13 (1942), pp. 1-13 

[2| T. Kooi'manh, "Serial correlation and quadratic forma in normal variables,” Annals of 
Math. Slat., Vol. 13 (1943), pp. 14-33 

[3] M. B. BaUTLETT, "On the theoretical specification of sampling properties of autocorre- 

1 tiled time scries,” Roy, Slat, Soc. SuppL, Vol. 8 (1940), pp 27-41. 

(4) W G. Mauow, "Note on the distribution of the serial correlation coefficient,” Annals 

o/ Hath, Rial,, Vol. 16 (1945), pp. 308-310 

[Ej L Gahmno, Proceedings of Lund University Mathematical Seminars, Vol. 5, pp 185-202 

[6] W. J. Dixon, "Further contributions to the problem of serial correlation,” Annals of 

Math, Slat., Vol, 15 (1914), pp, 119-144. 

[7] M, H. Quenowllb, "Some results in the testing of the serial correlation coefficient,” 

Biomlrika, Vol. 35 (1948), pp. 201-7. 

[8] M, H. QcEN'ouiLiiB, "Approximate tests of correlation in time series 1,” Roy Slat So c 

Suppl., Vol, 11 (1949). 




ON THE ESTIMATION OF THE NUMBER OF CLASSES IN A 

POPULATION 1 

By Leo A. Goodman 
Princeton University 


1. Summary. This paper deals with the following problem: Suppose a popula¬ 
tion of known size N is subdivided into an unknown number of mutually exclusive 
classes. It is assumed that the class in which an element is contained may he 
determined, but that the classes are not ordered. Lot us draw a random sample 
of n elements without replacement from the population. The problem is to 
estimate the total number K of classes which subdivide the population on the 
basis of the sample results and our knowledge of the population size. 

There is exactly one real valued statistic S which is an unbiased estimate of K 
when the sample size n is not less than the maximum number q of elements 
contained in any class. The restriction placed upon q is unimportant for many 
practical problems where either there is a reasonably low bound for q or those 
classes containing more than n elements are known. An unbiased estimate does 
not exist when there is no such knowledge. 

Since the unbiased estimate can be very unreasonable, modifications of S are 
considered. The statistic 


r 


S' 


N - 


N(n - 1) 

»(n — lj ’ 



if S' > 

i**i 

if s' <Ei () 


where x x is the number of classes containing i elements in the sample, 
is the most suitable estimate, in comparison with three other statistics, for a 
hypothetical population. 

The case where each element in the population has an equal and independent 
chance of coming into the sample is used as a model for some sampling procedures 
and also as an approximation to the case of random sampling. 


2. Introduction. The problem discussed may be described in terms of colored 
balls in an urn. How should we estimate the number of colors present in the, urn 
on the basis of both the sample which gives the number of, say, white balls, rod 
balls, etc., and our knowledge of the total number of balls in the urn: 

The following practical cases illustrate some of the ways in which this problem, 
presents itself: 

(1) A company has received a large number of requests for a free sample of 
its product. It is known that the same people often send more than one request. 


1 Prepared in connection with research sponsored by tho Office of Naval Research, 

572 



573 


ESTIMATION OF NUMBER OF CLASSES 

From a sample of the requests we wish to estimate how many different people 
have sent requests. 1 

(2) The Social Security Board possesses a large collection of Social Security 
cards. It is known that some people obtain different cards when they change 
jobs. From a sample of the cards it is desired to estimate how many different 
people have Social Security cards. 3 

(3) A person who sells durable commodities anticipates opening a store 
which is to be located at a highway intersection. He would like to know how 
many different automobiles pass through the intersection in a given time period. 
The total number of automobiles may be easily observed but some probably 
pass through more than once. This type of inquiry is also useful to advertising 
agencies which must decide the most efficient location for billboards. 

(4) The State Unemployment Compensation Board possesses a laTge list 
of the people receiving unemployment benefits. It is desired to estimate the 
total number of families benefiting from the insurance program on the basis of a 
random sample of the people named on the list. 

(5) The number of words in a book may be easily estimated and a sample can 
be taken. The problem of estimating the number of different words in a book is 
another analogue of the general problem.'* 

3. Results and derivations. In order to show that an unbiased estimate of the 
number of classes in a population exists when the sample size n is not less than 
the maximum number q of elements contained in any class, we need prove the 
following two statements: 

Lemma 1. Suppose we have K classes of A similar elements with n t elements m 
class 1, ni elements in. class 2,•••,«* elements in class K. The class of an element 
is readily identifiable when the element is examined. Let 

q = max (n,). 

Suppose a random sample is drawn wiihovi replacement. IJ x, is the number of 
classes containing i elements in the sample, and K, is the number of classes containing 
j elements in the population, then 

E(xi) = it U A, n)Kj, 
where Pr(i \ j, JV, n) shall henceforth be an abbreviation of 

ni nN-^1 

Cn ’ 

1 Submitted by Charles Caliard to question and Answers, The American Statistician, 
Vol, 3, No. 1, p. 23. 

»Mentioned to the author by Dr. J. Stevens Stock of Opinion Research Corporation 

4 Mentioned in letter to the author from Frederick MoBteller of Harvard University. 



574 = 


X,EO A. GOODMAN 


Proof. Let y, be the num!>er of elements appearing in the wimple from the s-th 

K 

class. The statement, is proved by considering E(x,) «= 52 E(S tp f\, where 


f 1. if I/- 


, “ t, 


Lemma 2. Let 


a u) = 


iO, if y, i. 

fo(o - l)(o - 2) • ■ • (a - t + I), for l > 0, 
11 , for t =« 0. 


If 


then 


a ( 1 — (— 1) 


i (Af — n + i — 1] 


(0 


n' 


(0 


52 A, I»r(i | j, M, n) « l. s 

I ~*1 


This result follows directly from the fact that 

£ (-1)' 05 [N ~ n + t - l]°* n - 0, for j % 1. 

l «*0 


The following theorem may be proved directly by the. preceding lemmas: 
Theorem 1. Suppose, a sample of n elements is drawn without replacement from a 
population of size N which is subdivided into K classes. Let 


A< 


1-(_,)<!" 7 •+,?->1"'. 
n co 


If there are x, classes containing i elements in the sample, then 

provided that n is not less than the maximum number q of dements contained in 
any class in the population. 

Theorem 2. There is at most one, real valued statistic which is an unbiased 
estimate of the number of classes in a population. 6 

Proof. Let us order the points of the sample space in the following manner: 
Letting x, be the, number of classes containing i elements in the sample, order 
the sample points by increasing values of x n ; for equal values of x n , order the 
points by increasing values of x n ~i ; for equal values of £„-i, order the points 


‘ The author is indebted to Professor Frederick F. Stephan of Princeton University for 
a statement leading to a simplification of the original result. 

* This statement was mentioned to the author by M. P. Peisakoff of Princeton University. 



ESTIMATION OF NUMBER OF CLASSES 


575 


by increasing values of aj n -a ; ■ • • ; for equal values of x%, order the points by 
increasing Xi . Let 

n 

*1 = n - 52 fcj ■ 

1-2 


To prove the theorem, we must show that to each 0, there corresponds a 
unique value S(i), which must be the value of our estimate when 0< is observed, 
in order that the statistic be unbiased. To each 

0i = Mi), x&), ■■■ , z n ( i)], 


let us associate the population 


Pi = 



jx,(i), xz(i), x 3 {i) ■ • • , x 



If Pi is the underlying population, then 0, for all i > 1 will occur with a proba¬ 
bility of zero. Since there are N classes in Pi, the value of the statistic must be 
S(l) = IV whenever 0i is observed in order that the estimate be unbiased. 
The theorem may now be proved by induction. 

Since all the P t used in the proof of Theorem 2 satisfied the condition that the 
maximum number g of elements contained in any class be not more than the 
sample size n, the statistic S is the only real valued statistic which is an unbiased 
estimate when q < n. 

When the restriction that q < n is removed, it is useless to search for an 
unbiased estimate Bince we have 

Theorem 3. There does not exist an unbiased estimate of the number of classes 
subdividing a population when it is not Imown whether the maximum number q of 
elements contained in any class is not more than the sample size n. 

By the preceding theorems it is clear that if an unbiased estimate exists it 
must equal S. However, S is generally not unbiased when n < q. 

Theorem 4. Suppose the statistics Si ,&,•■■, S n are the solutions of the system 

of linear equations 


= i Pr(i | j, N, n)Sj , for i - 1,2, ■■■ ,n, 

j-i 

where x l is the number of classes containing i elements m a sample of size n from a 
population of N dements. If Kj is the number of classes containing j elements in the 
population , then E(Sj) = K s , for j = 1, 2, • ■ • , » when n is not less than the 
maximum number q of elements contained m any class. 

Proof. We observe that the statement is certainly true for j - q -f- i, q -t x, 

■ • • ,n, since 

E(Sj) = Kj - 0, for j = q + l,q + 2, ■■■ ,n. 

The statement is also true for j = q, since 



576 


LEO A. GOODMAN 


To prove that E(S t ) = K,, for any j < q, we assume it to be true for all i > j t 
whereupon its truth for j follows. 

n 

By Theorem 2, and 3, it is dear that 23 &> “ S. Since 

iiK, - A r , 

i-i 

it seems reasonable, to ask whether the values of the estimates «S’,, ff* , ■ > • , 
are in agreement with the known value of the size of the population. The unbiased 
estimate of K can be shown to he internally consistent by 
Theorem 5. Suppose a sample of size n is drawn without replacement from a 
population of N elements which is divided into classes. If x,- is Ihr number of classes 
containing i elments in the sample, and if the linear equations 

*»- £Pr(i|j.iV’ I »).S, , 

arc solved simultaneously for Sj , then 

±SS, - N. 

j-i 

The theorem follows readily from the fact that 

* Pr(t I 3, N,n) n ■(: and 23 ixi » n. 

i-i ty (-i 


The. variance of S may now be calculated by means of the formula 

<r« - 23 AiAjUa ■= 23 m„(i, f)K,K t 

+ 23 bnAiJ) - m„(i, })]k}> , 

l«l ) 

where u {j is the covariance between x { and x,, m„(i,j) is the covariance between 
and 5 JV , when r h,n r ~ s and ns ~ l, and is the covariance between 

i,„ r and 5jv r when n, = s. 

Since the statistic S can be very unreasonable, we consider other possible 
estimates of K. The statistic 


S' 


,r N ™ 

N - zt 


may be sliown to be a modification of S which replaces the number %t of classes 
containing i > 2 elements in the sample by an additional U( classes, each 
containing only one clement. Since the values of Ki for i > 2 are relatively small 
in the practical problems of Section 2, (S' might be used as an estimate. 

Another statistic which may be used to estimate K is 

„„ N A 



ESTIMATION OF NUMBER OF CLASSES 


577 


This statistic may be shown to overestimate K whenever q ^ 1. The estimate 

S" = £ a, 

t-1 

underestimates K when n < N — m where m is the least number of elements 
contained in any class. 


4. Binomial sampling. Let us suppose that each element from a population 
of N elements has an equal and independent chance p = 1/r of entering the 
sample s. In this case, the size of the sample obtained is a random variable y 
which is binomially distributed with mean Np. If a large random sample of n 
elements is drawn without replacement from a large population of size N, then 
the results when interpreted in terms of binomial samples where p = 1/r = n/N 
are a good approximation to the results obtained by the usual model. Binomial 
sampling may be considered a model of the case where one attempts to obtain 
the sampling ratio p = 1/r by drawing simultaneously an uncounted sample of 
elements which is estimated as being of the appropriate size. 

In the case of binomial sampling, the statistic 


U » £ #•*>» where B,- = 1 — (1 - r)' 

i-i 


may be shown to lx?, an unbiased estimate of the number of classes in a population 

from which binomial samples are drawn. 

Let us now consider the statistic which corresponds to S for the case of 

binomial sampling; i.e., 

B' = N - rV-. 


It may be shown that 

£’(/}') = Ki + lu 4- 2 L? ~ c *& ~ PY • 

Hence, the statistic B' will underestimate K whenever 

V < 1 “ G) ’ f0V j = 3 ’ 4 ’ ’ ’" ’ ^ 

Since 

‘-®r 

i • t *.• j ■ tnr i -> 2 when V > L B' overestimates, and when 
is a decreasing function of j for; > A wnen y s y, 

/2\ 1/fl - 2 



578 


I.EO A. GOODM 


B' underestimates the value of K, When p is such that 


1 



< 


i 


the expected value of B' is brought, closer to K by undenveighting some K s 
and overweighting others, 


6. A hypothetical population. 7 Suppose, we draw a random sample of 1000 
elements without replacement from a population of 10,(XX) elements where 

Iu - 9225 , Kt - 330 , K } « 33 , Ki « 1 . 

Hence, K =* 9595. By means of Table 1, let us now compare on the basis of 
binomial sampling the estimates which luivc been presented in the preceding 
sections. Since A' and n arc large, these results are a good approximation to the 
case of random sampling without replacement. 


TABLE 1 


Estimate 

Expected inline 

Hiat 

\y/S!can Square Error 

s 

9505 

0 

j 347 

S' 

9570 

-25 

207 

.S’" 

9959 

301 

490 

A"" 

990 

-8599 

J 8000 


It is clear that the best estimates of the miml*er of classes in tins particular 
population arc .S’ or S', since S has the least bias, &’(.S') — K, and S' lias the 
least mean square error, E(B' — K)‘. One might argue that both ,5' and »S" are 
the statistics which are capable of giving nonsensical estimates. However, we 
may decide to modify S or W in order to always get reasonable estimates by 
using the statistics 

S, if N > a > i,x t , 

l*-l 


N, 

if 8 > ,V, 


if .S’ < £ a;, 

i«l 


f S', 

if .S' > E x, 


<■*1 

1 n 

n 


if S' < X *■ 
1 


7 Other examples have been investigated by Frederick Mosteller in Questions and 
Answers, The American Statistician, Vol. 3, No. 3, p. 12. 



ESTIMATION OP NUMBER OP CLASSES 


579 


Although these modified statistics T and T' are not unbiased, they have the 
desirable property that 

MEE(T) < MSE(S), and MSE(,T>) < MSE(S'). 

Since this hypothetical population is a plausible one for the practical problems 
of Section 2, the modified statistics T or T' seem, therefore, to be “best” for 
estimating the number of classes for these problems, where the “best” statistic 
is defined as the one which never gives unreasonable estimates and has the least 
mean square error. 

The author wishes to express his appreciation to Professor John W. Tukey 
whose suggestions were very helpful. 



CONCERNING COMPOUND RANDOMIZATION IN THE BINARY 

SYSTEM 

By John K. Walsh 
The Rand Corporation 

1. Summary, Let us consider a sot of approximately random binary digits 
obtained by some experimental process. This paper outlines a method of com¬ 
pounding the digits of this set to obtain a smaller set of binary digits which is 
much more nearly random. The method presented has the property that the 
number of digits in the compounded set is a reasonably Large fraction (say of the 
magnitude -J or \) of the original number of digits. 

If a set of very nearly random decimal digits is required, this can be obtained 
by first finding a set of very nearly random binary digits and then converting 
these digits to decimal digits. 

The concept of ''maximum bins” is introduced to measure the degree of 
randomness of a set of digits. A small maximum bias shows that the set is very 
nearly random, 

The question of when a table of approximately random digits can be considered 
suitable for use as a random digit table is investigated. It is found that a table 
will be satisfactory' for the usual types of situations to which a random digit 
table is applied if the reciprocal of the number of digits in the table is noticeably 
greater than the maximum bias of the table. 

2. Introduction and discussion. With the development of the theory of games 
and the more widespread use of experimental methods for determining approxi¬ 
mate distributions for statistics whose probability laws are difficult to obtain 
analytically, a demand for large sots of random digits has arisen. The problem of 
obtaining a set of digits which can bo considered sufficiently random for the 
situations to which it would be applied, however, is not an easy one. One approach 
to this problem consists in obtaining a set of digits by some procedure and then 
applying tests to this set of digits to determine whether it can be considered 
satisfactory. Although appropriate choice of the teats may result in acceptance 
of seta of digits which are suitable for certain special types of situations, this 
approach is of a negative character and does not prove that a given set of digits 
is sufficiently random; it merely indicates that this may be the case. What is 
needed is a constructive approach to the problem, i.e., a method of constructing a 
set of random digits which can be proved sufficiently random for most applica¬ 
tions if certain intuitively acceptable conditions are satisfied. A step in this 
direction has already been taken by H. Burke Horton in [1] and by H. Burke 
Horton and H. Tynes Smith III in [2], This paper presents what is hoped will be 
another step in this direction. 

In this paper, considerations will be limited to the case of binary digits. The 
reasons for this are twofold: 


580 



JOHN E. WAiBH 


581 


(a) . The method used for compounding the digits yields a sharp upper 

bound for the maximum bias of the compounded set (i.e., a bound that 
the maximum bias could actually attain) only for the case of binary digits. 

(b) . Many of the experimental procedures for obtaining approximately 

random digits consist in first producing binary digits and then converting 
to another number base. Thus binary digits are produced directly. 
Hence, to use the results of this paper, the only modification required in 
these procedures would be to compound the binary digits before they 
are converted. 

Now let us consider some definitions: A set of random variables each of which 
can assume only the values 0 and 1 will be referred to as a set of binary digits. 
For convenience, each of the random variables making up a set of binary digits 
will be called a binary digit; this is not to be confused with the value obtained 
for the random variable. The absolute value of the deviation from -) of the 
conditional probability that a specified binary digit has the value 0 (or 1) is 
called the bias of that digit for the given conditions on the remaining digits of 
the set. The maximum bias of a binary digit is defined to be the maximum of the 
biases of that digit with respect to all possible conditions on the remaining 
digits of the set. The maximum bias of the set is the greatest of the maximum 
biases of the digits of the set. A set of binary digits is said to be random if its 
maximum bias is zero. 

The method used to prove that a set of compounded digits has a sufficiently 
small mn. vimnm bias is somewhat similar to the situation encountered in mathe¬ 
matics where one begins with certain axioms and then draws conclusions. If the 
axioms are correct, the conclusions are necessarily valid. The first step in the 
compounding procedure consists in obtaining a set of binary digits by some 
experimental process (perhaps from a random digit machine which is based on 
somo physical principle). The experimental process is so chosen that there is no 
doubt that the set of binary digits produced satisfies the two conditions: 

(i). The maximum bias of the set is less than or equal to some specified 


value a(<£). 

(ii). The digits of the set can be arranged in a specified array which has the 
property that the rows of the array are statistically independent. 

On the basis of these two assumptions (which play the same role as the axioms 
mentioned above), it can be proved that the maximum bias of the resulting 
compounded set of binary digits never exceeds a specified value which depends 
on a, Moreover, the upper bound for the maximum bias of the constructed set ot 
binary digits can be made extremely small even for large values of a. 

If the experimental process is suitably chosen, conditions (i) and (u) can be 
satisfied beyond any doubt. For example, let us consider 1000 people located m 
different parts of the world and not in contact with each other Let each person 
flip an ordinary coin high in the air so that it will land on a flat hard surface, 
record the result (say 0 for a tail and 1 for a head), and then repeat tbl * P roce ^ e 
until 5000 binary digits are obtained. If a 1b set equal to 3/10, condition (i) is 



582 rojirotrjtn R\v»oMit£.vnos 

obviously satisfied for the resulting mu of 5/100,000 binary digiK Condition (ii) 
evidently holds if the array is taken to consist of WOO rows where each row 
contains 5(XK) binary digits obtained from one person 
The ideal choice for « would be the actual minimum Idas of the set of binary 
digits obtained from the experimental process. Then the compounding procedure 
for obtaining a set of digits with a specified upper bound for the maximum bias 
would 1>e simplified; also the nurntier of digits in the compounded set would he a 
larger fraction of the original numlwr of digits. Invariably, however, the proper¬ 
ties of the experimental process are not known with sufficient accuracy for 
obtaining anything but a safe upper bound on the maximum bias of the set of 
digits produced. This situation is analogous to that of estimating the length 
of a stick which a very rough measurement has shown to be about 10" long. 
Although one might lx* very hesitant to heliev** that the length of the stick lies 
between 9.9" and 10.1", the contention that the length lies between 5" and 15" 
can be accepted with virtual certainty and any logical conHu**iou« based on this 
contention can also be accepted with virtual certainty. 

Given the number of binary digits in a set ami the maximum bias of the set, 
is it possible to determine whether tin: set is suitable for use as a set of random 
binary digits? An important consideration in answering this question is the use 
that is to be made of the set of digits. This must always be taken into account 
before the suitability of the set can lx* decided. Fur example, if no more than 
1/1000 of the digits of the set are to lx* used for any particular situation, the 
set might he satisfactory for the types of caws to which it would be applied; 
on the other hand, the set might not be suitable for cases of these types if ail the 
digits of the set are used for each situation. This example calls attention to 
an important point, namely that the suitability of a set of binary digits depends 
on the number of digits in the set. Let a set have a fixed non-zero maximum 
bias p. If the set contains a sufficiently large number .V of digits, relations and 
expressions involving the digits of the set can Ik; found whose probabilities, 
momenta, etc., can differ greatly from the values which would be obtained if the 
relations wore based on the same number of truly random binary digits. As a 
specific example consider the relation 

All the digits of the set have the value zero. 

If the reciprocal of the number of digits in the sot is of the same order of magni¬ 
tude or smaller than the maximum bias of the set, the ratio of the probability 
of this expression to its hypothetical value can differ noticeably from unity. 
Thus, at least in certain special cases, a necessary condition for the suitability 
of a set of binary digits is that l/N > > p. This condition, however, is also 
sufficient for most situations to which a set of random digits would be applied. 
The approximate sufficiency of the condition is a direct consequence of the fact 
that any set of N binary digits can be considered as a sample value from an 
.Y-dimensional population consisting of 2 y discrete points. The 1/A > > p 
restriction implies that the probability concentrated at each of the 2 y points is 



JOHN E. "WALSH 


583 


very nearly equal to the hypothetical value of (£) w for all possible conditions 
on the remaining digits of the set. 

The 1/N > > p condition is very satisfactory from the viewpoint of proba¬ 
bilities. The probability of any relation based on a subset of the digits of the set 
(possibly conditioned on other digits from the table) oan be interpreted as the 
sum of the probabilities of those points included in a certain region (defined by 
the relation) of the TV-dimensional probability space of the set of digits. By 
expanding (h p) N it can be shown that the ratio of the probability of any 
relation based on one or more digits from the set to the corresponding value for a 
truly random set of digits will be very nearly equal to unity if 1 /N > > p 

It is evident that the higher order moments of an expression based on one or 
more digits of the set can differ noticeably from its hypothetical value even if 
l/N > > Pi any deviation from the ideal situation, no matter how small, can 
become important for high order moments. For the first few moments, however, 
deviations from the hypothetical values are not appreciable since these moments 
are based on the probabilities at the 2 y points in the TV-dimensional probability 
space and these probabilities are very nearly equal to the hypothetical value of 
($)" in all cases. 

The above discussion shows that the values of TV and p are sufficient to deter¬ 
mine whether a set of binary digits is suitable for use as random binary digits 
for a wide variety of situations. Analogous considerations apply for digits to any 


number base. _ 

A magnitude definition of the relation 1/TV > > p is difficult to specify. If p 
is the upper bound for the maximum bias of a set of digits obtained by the 
compounding procedure outlined in this paper, however, it seems that a reason¬ 
able condition would be that 1/TV > 50 p. This condition implies that the 
probability of any relation based on digits of the set can not differ from its 
hypothetical value by more than approximately 4%. In most practical 
applications the value obtained for p would be noticeably greater than the 
true value of the maximum bias of the compounded set. 

Since the maximum number of digits which can be taken from a table is the 
total number of digits in the table, the above considerations suggest that a 
random digit table should be constructed so that the reciprocal of the number o 
digits in the table is noticeably greater than the maximum bias of the table. 
Any table having this property would be satisfactory for most situations to 

which it would be applied. a..,. sets 

Now let us consider two different compounding methods which produce sets 

of binary digits with the same upper bound for the maximum bias. If the com- 
putationnl difficulties of applying the two methods are of comparable™«“‘“ ‘ ' 
it seems reasonable to prefer the method whrch yields the law® 

For example, if the number of digits in the set obtaiiiedbythefirstmt 
only 1/8 of the original number of digits while the number m the i t obtomed 
bv the second method is 1/3 of the original number, the second method wou 
^em p"fel“tS if tt required as much as 100% mere compu.at.on 



584 


COMPOUND RANDOMIZATION 


The compounding method presented in this paper lias the property that the 
number of digits in the compounded net can beheld to a reasonably large fraction 
of the original number of digits at the same time that the upper bound for the 
maximum bias is made extremely small. The method presented by Horton in [1] 
does not have this properly. For example, let a » 1/10. Applying Horton's 
method, when the compounded net consists of 1/8 of the original number of 
digits the upper bound for the maximum bias is 12.8 X 10 \ The example 
presented in section 3, however, shows that a compounded set whose number of 
digits equals 1/3 of the original number and which has an upper limit of 11.7 X 
1CT 7 for the maximum bias can be obtained using the method presented in the 
next section. 

Although the compounding method outlined in section 3 is presented as a 
series of steps, the value of a digit of the compounded act can be written as a 
linear function (mod 2) of digits of the original set, This was not done in what 
follows because of the complicated nature of the general form of such expressions. 
In any particular case, however, these expressions can be written without much 
trouble and the compounded digits computed from the original digits in a 
single step, 

3. Outline of compounding method and statement of theorems. This section 
contains a description of the compounding method mentioned in the preceding 
two sections as well as statements of tho basic theorems concerning this com¬ 
pounding method, Proofs of the results stated in this section are given in section 4. 

Let us consider the array of mn binary digits 

Xu , Xn, • • • , Xm 
Xn i Xtt , • * * , Xj« 

( 1 ) ... 


%ml ) %mi t ' * * f %mn 

■which satisfies conditions (i) and (ii); i.e,, the maximum bias of the set (1) is 
less than or equal to a while a digit x,, is independent of a digit x„ if r ^ u 
(if r « u, however, x u „ is not necessarily independent of x r »). 

Let a new set of (m — l)n binary digits 

(2) Va, (i * 1, ">,»*■ l;j ■ 

be formed as Mows: 

Va ■* %nj + x;y (mod 2), 

(* » 1, • * - , m — 1; j - 1, * • • , n). 

Then the biases of the y<j have the properties 
Theorem 1. Let U be a specified setoft— 1 of y u , . ■ ■ , , ■ • • * 

1 (1 < t < m — 1), while V is a specified set of zero or more of the y M ’B 



JOHN E. WALSH 


585 


with q j. Also let 9 consist of the set of integers such that v e 6 if y ■ E 77 Then 
if yu = maximum bias for the set ,z» n ,(u = 1, . . , n ) } ’ 

| Pr{y (l - 0 | U, V) - i [ < y< [1 _ 7l )/(i + yk) ] 


/[! + Hd - T*)/d + 7i)] 

ktl 

for all possible selections of U, V and of the values for the digits of these sets 
Corollary 1. J/ * - 1 o/ Vu , . • - , y ^ lh , y(l+1) ,, ... , ^ 

known values, the maximum bias of the binary digit y {} is less than or equal to 

«[1 - (J - «)‘/(J + «)‘]/[l + (J - «)'/(* + a) 1 ]. 

Corollary 2. 2 he maximum bias of the set (2) is less than or equal to 
«{1 - (4 - + ft - a) m ~7(i + a)- 1 ]. 

The basic, operation in the method of compounding binary digits is outlined 
in the procedure given for obtaining the y {j from the x uv . Let m = (l -f fi) ... 
(1 + tic). Then a set of t\ • • • txn binary digits can be obtained from the original 
set of mn digits s u „ by continually applying this basic procedure. The first step 
consists in dividing the rows of (1) into (1 + fi) • • • (1 + t K ) sets each consisting 
of (1 + fi) rows in some specified fashion. Each of these sets is an array of 
(1 + ti) X n binary digits for which the rows are independent. Apply the method 
used to obtain the from the x uc to each (1 + fi) X n array separately. Then 
each array yields a set of fi n binary digits and there are (1 + fi) • ■ ■ (1 + fi) 
such sets. In each set arrange the fin digits into a single row in some specified 
manner. This furnishes a new array of [(1 + fi) • • • (1 + fit)] X [fin] binary 
digits for which the rows are independent. Repeat this procedure with respect to 
fi thus obtaining a new array of [(1 + fi) • ■ • (1 + fi)] X [fifin] binary digits for 
which the rows are independent; etc., until a (1 + fi) X (fi ■ ■ • fi-in) binary 
digit array for which the rows are independent is obtained. Then form a set of 
binary digits Y th , (g = 1, • • • , fi ; h = 1, • • • , fi • • • fi_in), from this array in 
exactly the same manner that the y,j were obtained from the x u „. Then the 
biases of the have the properties 
Theorem 2. Let fi 0 , fii , • ■ ■ , fix be defined by fi 0 = a and 

fi* - fi^l - ft - 0 u -i)‘V(* + |3»-i)'-]/[! + ft " J-O'Vft + 0—0‘“]> 

(w = 1, ■ ■ • , K) 

Then, if exactly t ~ 1 of Yu,, • ■ • , Y, <„_i ja , Y o+d h, " ■ i Y tK h have known values, 
(1 < f < ijc), the maximum bias of the digit Y e \ is less than or equal to 

Mi - ft - fcr-07ft + Pk-iY\/[i + ft - 0*-O7ft + fix-0']. 

In particular, the maximum bias of the entire set of Yis less than or equal to 
fix • Also 

fix-i[l - ft - fix_0 7a + fix-0 V[1 + ft - fi*-07(i + fi^-7‘i 

0 sr i 2 4 .2*”* •2^~ 1 2* 

< 2* 1 • t * &-i ■ ft-s ■ • ■ <2 * *i ' * ■ 


(3) 



58G 


COMPOUND RANDOMISATION 


The inequality (3) ia frequently uwrful from a computational viewpoint, 
Although the right hand side erf (3) is usually noticeably greater than the left 
hand side, in many eases this rough upper bound is itself small enough to show 
that the upper bound for the maximum bias is of the downed order of magnitude. 

If the set of compounded digits is to be used for a random binary digit table, 
Theorem 2 shows that advantage can lw taken of the position of the digits in the 
table. Let M * l t ■ * • l x ~in and enter the, values of the Y& , (g « 
h » 1, • • ■ , M), into the table in the order 

Tu , Tu , • • •, hi * . T tt , • ■ • , } su , F« , - • • , Yi K \, ■ * • , Y iR M . 

Then, if a set of digits is taken from this table in eomwcutive order (F« follows 
Y, kU ), the upper hound for the maximum bias of this set is dependent on the 
number L of digit# in the set. From Theorem 2, the maximum bias of a set of L 
digits taken in consecutive order from a table formed in this manner is less 
than or equal to 

Aw[ 1 ~ (4 - ftt-i)*/(* + + (h - ftr~i)7(4 + 0*-O'] 

for values of L such that (i — l)Af < L < tM, where 1 < t < t K . Tims, if a 
small Bet of digits is taken from this table in consecutive order, the upper bound 
for the maximum bias of this set will usually la* noticeably smaller than the 
upper bound for the maximum bias of the. table. Since many uses of a random 
digit table require only a small fraction of the total number of entries in this 
table, this property would seem to be desirable. It should be emphasised, how¬ 
ever, that the maximum bias of a set taken from this table is always Ires than 
or equal to 0* irrespective of the positions that the digits of the seta occupy in 
the table. Thus nothing ia lost by constructing the table in this manner but 
something can be gained for small sets if the digits are taken from the table in 
consecutive order. 

Now let us consider situations in which it is required that the number of 
digits in the compounded set is at least a specified fraction, say 1/C, of the 
original number mn of binary digits. This requires that K and k , • * • , l* be 
chosen so that 

<1 W< 1 + k) • *♦ (1 + Ik) £ l/c. 

Also, for given values of K and C, it seems preferable to choose h , • * -, f« so that 
the value of 0 k is at least approximately minimised. Kxaminalion of the results of 
Theorem 2 indicates that a reasonable method of determining the values of 
k i ' • • i bt with this in mind consists in first choosing U as small as possible, then 
(given the value of h equal to its minimum value) choosing ft as small as possible, 
etc. This method is also recommended by the fact that the resulting values of 
k ,1 k are readily determined. The explicit procedure for finding k , * * • , <* 
is given by 

Theorem 3. Let the values of the integer K and the constant C (> 1) he given and 
consider the integers k , subject to the condition 

h • • ■ Ik/ (I + b) * ♦ * (I + be) > 1/C. 



JOHN E. WALSH 


587 


The minimum value, of U is the smallest integer satisfying 

£1 > 1 /(C - 1). 

In general, 2 < w < K - 1, having already determined t ,, ... ^ os their 
minimum values, the value of l w is the smallest integer satisfying 

tu > • • * £u-i/(l + h) ■ • ■ (1 -j- — 1], 

Finally, given l\ , , ft—i as their minimum values, the minimum value of ft 

is the smallest integer satisfying 

Ik > l/[Ct\ - • • ft_i/(l + <i) (1 -j- t K -\) — 1]. 

Now consider the general situation encountered in the application of the 
compounding process outlined above. Here the values of a, C are given and it is 
required to choose K and £i, • • • , 1% so that the upper bound for the maximum 
bias of the compounded set of h ■ ■ ■ t K n binary digits Y ck is less than or equal to a 
specified value b. The following procedure furnishes a method of solving this 
problem: 

Let K = 1, obtain U according to Theorem 3, and then compute ft . If ft < &, a 
solution has been obtained. If ft > b, let K = 2 and repeat the procedure to 
obtain ft . If ft < b, the values of ft , ft and K = 2 are a solution. If ft > b, 
repeat the procedure for K = 3; etc. In practical situations, the value of K is 
usually bounded (e.g., by independence properties of the original set of digits). 
If ft is still greater than b for the maximum permissible value of K, no solution is 
obtained. This means that either b must be increased or 1/C decreased or both 
if a solution is to be found. In many cases, a large amount of computation can be 
avoided by using the, inequality (3). For marginal situations, however, a solution 
may Vic missed by using (3) instead of computing ft. 

Example of method. The following table represents an example of application 
of the above method; 


a » 1/10 

l/C - 1/3 

6 = 2 X 10-* 

K * 1, 

<i *= 1 

ft = 2 X 10~ ! 

K «= 2, 

k — 1, k ~ 2 

ft < 1.6 X 10 -3 

K - 3, 

li ** 1, (a =» 3, <3 = 9 

is, < i.04 x ur* 

K » 4, 

<[ aa 1, <* » 3, <3 = 10, U = 44 

ft < 1,17 X 10~ s . 

Tims K 

4, ti ij t% 023 t% 331 10; ti 44 is ft solution, 


4, Derivations. Tho purpose of this section is to furnish proofs of the results 
stated in the preceding sections. 

4,1 Proof of Theorem X. Let us consider the conditional probability that an 
arbitrary but fixed yn has a specified value when the values of a fixed subset of 
zero or more of the remaining y’s are known. For convenience, assume that y n 
is the binary digit considered and that the values of yn , yn, • • ■ , Va (where t 
is a fixed integer such that 1 < t < m — 1) and a set S are given while the 


888 


COMPOUND RANDOMIZATION 


values of the remaining y'ta are unknown. Here .S' represents an arbitrary but 
fixed set of zero or more of tin; y,/« for which j > 2 while t 1 has the inter¬ 
pretation that none of the y,i , (t > 2), are given. Let 

Pr(x m \ ®= 0 15) » h + «<+i an( l Lr(z*i ** 0* ] S) » | + «*, 

(* “ 1, *•*,<). 

Then, using the independence, conditions satisfied by the z% 


PtiVn 


bi | yn •*> 6s, ■ • • i y« “ ft; 

“1+1 (+1 "I If-f+l HI 

JI (i + ffk) + II ih ~ «*) J f [U (I + «*} + l| (i ” at) 

ri+i i+i 1 /ri4t «n 

h + + at) - IJ (i - at)J j | jjf (h + at) 4- n (I “ **) 

^ -(- on S. 


Now 1 6 1 - (1 - ?)/( 1 + P) if 0 < P < 1 and equals (P - 1)/(1 + P) if 
P > 1, where P « II (l — a*)/(J + at). Let y» be the maximum bias for the 
set of binary digits Xmi , * • • , Xu* * (•» «■ 1, • * • , m). Then it is easily seen that 

[ i+i ~i / r *+* 1 

1 — JT (4 — T *)/(4 + 7 *)J j + |I (§ “* T*)/(l + Tt) J - 

Thus 


Pr{y n - Ml/si ■* ft, '"tl/tt" ft ;<S) — \ \ 


< Yi 


- (+i ~i f r (+i 

1 — II (^ — Y*)/(i + 7*) j 1 + jf| (i “ Yt)/(l + Tt) 


for all possible selections of b x , •• • , ft and all possible selections of S and the 
values for the digits of S. It is to be observed that this inequality is valid for i * 1. 

Evidently this result can be modified to apply to on arbitrary y (i for which 
i - 1 of y X j , • ■ • , , 2/cf+i)/, • • * , have given values. This obvious 

modification results in Theorem 1, 

4.2 Proof of Theorem 2. By Corollary 2, the maximum bias of the ((1 + 4) 

(1 + 4c)] X [fin] array is less than or equal to ft . In general, 2 < w < K, by 
Corollary 2 the maximum bias of the [(1 + f w+ ,) • • • (1 -f /*)] x [4 • • • ft#] 
array is less than or equal to /ft. Finally, by Corollary 1, if exactly t - 1 of 
Yu,, , Y ( ,_i)A, F( C+m , • • • , y (jch have known values, (1 < f < 4c), the 
maximum bias for the binary digit Y t k is less than or equal to 

mi -a- h-iY/a + MV(i + (i - ftt-i y/a + ft-o'j. 



JOHN E. WALSH 


589 


The inequality (3) is an immediate consequence of the relation 

, a[l ~ (h ~ o)7(l + «)']/[l + (h~ o)7(l + a)*] < 2sa 2 . 
4.3 Proof of Theorem 3. From the given condition 

U > l/[Ch • ■ • Ik- i/(l + k) ■ ■ ■ (1 + in) - 1]. 

From this inequality for t K it follows that 

Ch-“ t K - i/(l + k) - • • (1 + f*-i) - 1 > 0. 


Thus 

fic-i > l/[Ch * 1 ' Ik- a/(1 + h) ■ ■ ‘ (1 ~ fjc-a) ~ 1]. 
In general, 3 < w < K - 1, given 

f. > l/[Cfi ■ • • W(1 +«•••(! + <«-i) - 1] 


itfollows that 

Cti -*• W( 1 + ti) •••(! + <»-i) - 1 > 0 


whence 

t„_i > l/[Ch «• > + k) • • • (1 + C-a) — 1]. 


Finally 


ti > l/(fl - 1). 


references 

[1] H. Bxjbkk Horton, “A method for obtaining random numbers,” Annals of Math. Stat., 

Vol. 19 (1W8), pp. 81-85. 

roi H BubM Horton and R. Tmm Smith III, "A chrect method for producing random 

[2 ) H. any nunlber 8y8tem » Annais o/ Math. Stat , Vol. 20 (1949), pp. 811-90. 



THE DISTRIBUTION OF EXTREME VALUES IN SAMPLES WHOSE 
MEMBERS ARE SUBJECT TO A MARKOFF CHAIN CONDITION 

By Benjamin Epstein 

Department of Mathematics, Wayne University 

1. Introduction. The extreme value problem an treated in the literature 
concerns itself with the following question: To find the distribution of the 
smallest, largest, or more generally the vth largest, nr vth amalleat values in 
random samples of size », drawn from a distribution whose probability law is 
given by the d.f. F(x). In this formulation the observed sample values x, , ■ ■ ■ ,x„ 
are assumed to be statistically independent. While the assumption of inde¬ 
pendence may be a good approximation to the (rue state of affairs in some 
cases, there arc situations where this assumption is not justified. 

Suppose, for instance, that the observations in the sample are ordered in lime. 
Then it may happen that successive observations are stochastically dependent, 
the extent of this dependence being a function of the time interval separating 
these observations. 1 In such cases the present distribution theory for extreme 
values in samples of size, n is inadequate and must be replaced by more general 
results. 

It is clear that a clean-cut analytic solution to the problem of the distribution 
of extreme values in samples whose members may be. stochastically dependent 
can be expected only for certain special kinds of dependence among successive 
observations. Wo are able, in this paper, to obtain t he distribution of smallest, 
largest, second smallest, and second largest values in samples of size n drawn at 
equally spaced time intervals from a stationary .Markoff process. 

2. The distribution of smallest and largest values in samples of size n drawn 
at equally spaced time intervals from a stationary Markoff process. In this 
section the following assumption is made; 

(A) observations aq , aq, ■•■,£„, • • • are taken in order at times t » 1, 
1*2, from a stationary Markoff random process. 

The only information needed in the investigation of a stationary Markoff 
process at integral values of time is the function 

(!) FjCsc, y) - Prob (*, < x, x,. K < y), 

independently of», where y ) must he such that the marginal distribution 
obtained by integrating over x or y (if or H take on a continuous range of 

1 If the observations an x n , • ■ • are taken at discrete limes h, k ' 

a measure of stochastic dependence between xi and x/ is the ordinary coefficient of correla¬ 
tion tt/, If the observations are taken from a continuous stochastic process a natural 
measure of stochastic dependence between observations made at two different times is the 
covariance function of the process. In this paper we shall limit ourselves to processes which 
are discrete in time. 


590 



EXTREME VALUES 


591 


values) or summing over the possible values of x, or x, +1 (if x , and x 1+l can take 
on only discrete values) is of the form 

(2) Fj(x) - Prob (x, < x), 

independently of i. 

An example of a random process meeting condition A is furnished by the 
Ornstcin-Uhlcnbcck process [1; 2j. In this case the joint df. of x, and x,+i is 
given by a non-singular bivariate Gaussian distribution. The results in the 
present paper arc stated completely in terms of the di.’s F 2 (x, y) and Fi(x) 
defining the stationary Markoff process and will in particular be valid for observa¬ 
tions taken at uniformly spaced time intervals from an Ornstein-Uhlenbeck 
process. 

In this section we shall find the distribution of smallest and largest values in 
samples Xi , , * * • , x„ drawn from a random process under assumption A and 

specified by the bivariate d.f. F 2 (x, y) and the associated one dimensional 
marginal d.f. Fi(z). Wc first prove Theorem I, 

Theorem I. Under assumption A, the distribution of largest values in samples of 
size n is given by the d.f. Gi 11 (x) — [2* s(x, x)] /[Fi(x)l 
To prove this result we note that G^ l) (s), the probability that the largest 
value in samples of size n is <x, is given by 

( 3 ) Gn'fc) “ Prob (xi < x, x 2 < x, • • • , x„ < x). 

To evaluate the right-hand side of (3) we proceed as follows: 

(4) Prob (®i < x, Xt < x, • • • , x» < x) » 

Prob (a* < %, xt < x, • ■ ■ , *n-i < x) Prob (x„ < x | xi <*,■•■ , a»-i < *)■ 
But under assumption A, (4) becomes 
(6) Prob (*i < x, xi < x, • • • , x„ <, x) = 

Prob (Xi < x, xi < x, ■■■ , x„_i < x) Prob (x„ < x | x„_i < x) 
or 

( 5 ') Gn\x) * Qi- i(x) Prob (x„ < x | x„_i < x). 

But according to assumption A, and (1) and (2) 

(0) Prob (x n < x | x»_i < x) » Prob (x„-i <x,x n < x)/Prob (x*-i < 

« F»(x, x)/Fi(x). 

Q l n '\x) - G C n-i(x) F i {x,x)/Fi(x) 

» G[ l \x) (F 2 (x, x)) n ~ 1 /(Fi(x)) n ~ 1 
- (F J (x,x))"-V(F 1 (x)) n - 2 . 


Therefore 

(7) 



m 


ISf NMM1S- rr^TKIV 


This proves Theorem I, 

For n « 1,2, and 3 respectively one gets 

(8) (?, H '(x) - Pi(r), Ot’(x) - F\(j, x) t <’n <*> «* {FAx, x)f> f F,(x). 

» 

Theorem II. f.'m/rr amumptum ,4, ihe drtnhuiinn of *mnIkai minee in mmpki 
of me n is given by the (If. 

(9) ICM « l - 11 ' ‘ x)r ’ * 

To prove this result wc first note that flU'fa), the probability that the smallest 
value in samples of size n be <x is given by, 

1 — Prob (x\ > x, xi >/,■■■, x* > x). 

To evaluate we proceed as follows; 

(10) Prob (xi > .t, x 3 > x, • • > , z r , > x) « 

Prob (xi > x, x% > x, • * * , x«„i > x) Prob (x„ > x j xi > x, - • < , x„„i > x). 

But under assumption A, (10) becomes 

(11) Prob (xi > x, Xi > x, • • • , x„ > x) « 

Prob (xi > x, Xj > x, • • • , x,„, > xl Prob (x* > * j x,_> > x). 

But 

(12) Prob (x» > x j Xk_i > x) *» Prob (x,„i > x, x* > x)/Prob (x„..i > x). 
To evaluate Prob (x„_i > x, x n > x) we note that 

(13) Prob (x„_i > x, £* > x) 4- Prob (x,_ t < x, x« > x) 

+ Prob (x»-i > x, x, £ x) + Prob (x*>i < x, x» <; x) *» 1. 

Also 

(14) Prob (x„_i < x, x„ > x) + Prob (x*_, < x, x„ < x) 

“ Prob < x), 

and 

(16) Prob (x*_i > x, x* £ x) 4- Prob (x„», ^ x) 

“ Prob (x, £ x). 

Recalling that 

( 16 ) F}{x, x) » Prob (x«-j < x, x» < x) 


and 

(17) 


Fy(x) - Prob (x„_i £ x) » Prob (x„ < x) 



EXTREME VALUES 


593 


we get. 

( 18 ) Prob (Xu—i > x, x„ > x) = 1 - 2Fi(x) + F*(x, x). 

Therefore (10) becomes 


(19) Prob (*i > x, xt > x, ■■■ , x n _ v > x, x„ > x) = 

Prob (xi > x, Xj > x, • • • , x„_i > x)[l - 2Pi(x) + F 2 (x, x)]/(l - Fi(x)) t 
Applying the recursion formula (19) successively we obtain 


(20) Prob (xi > x, Xi > x, • • • , x n > x) = 

Prob (xi > x)[l - 2Fi(x) + F s (x, x)rV[l - ^(x)]"" 1 


- [1 - 2 F 2 (x) + F*(x, x)] n -V[l - Fi(x)r J . 

Therefore HnU.x), the probability that the smallest value in. samples of size n 
is <x, is given by: 


( 21 ) 


//?’(*) = 1 


[1 - 2 Friz) + Fj(x, x)]"- 1 
[1 - F 5 (x)]»- 2 


This completes the proof of Theorem II. 

In particular for n *=• 1, 2, and 3 respectively the d.f.’s of the smallest value in 
samples of size n are given by: 

Ri l \x) - Fi(x), Hi n (x) - 2Fi(x) - F a (x, x), 

3. Distribution of the second largest and second smallest values in samples 
of size n drawn at equally spaced time intervals from a stationary Markoff 
process. Under assumption A of Section II we can state the following theorem. 

Theorem III. Under assumption A ihe distribution of second largest values in 
samples of size n, n > 2, is given by the d.f. G w (x), 

g ( :\x) ~ [fax, x)r i /[F l (x)r i 

+ 2[Fj(x, x)]"~Vi(x) - Fj(x, x))/[F 1 (x)]" -2 
+ (n - 2) [Fj(x, x))" -3 {Fi(x) - F a (x, x))*/Mr S (l “ &(*))■ 

To prove this result we first note that Qn^x), the probability that the second 
largest value is < x, is given by 

Gn'ix) *» Prob (xi & x, Xi < x, ■ • ■ > x n ^ a) 

+ Prob (xi > x, x t < x, xx <, < *) 

4- Prob (xi < x, Xi > x, %i < x } Xi < x, ■ •' < x) ri • • 

+ Prob (xi < x,xi <x, , x«-j < x, x»_i > x,x„ < x) 

+ Prob (xj < x, xt < x, ■" , *»-i <x,x n > x). 


(23) 



594 


benjamin &urm 


According to Theorem I 

(24) Prob (*i < x, xt < x, • * • , x* < *) “ IFifcr, x^'VfFiCc)!" 2 , 

It can readily be shown that 

Prob (x, > x, xj <, x, Xj < x, - • • , x» < x) 

(25) - Prob (it < x, X} ^ x, - • ■ , x, i < x, x« > x) 

■ ^ l i)r , iF.w-F^ xM/if.wr 

It can also be shown that, each of the remaining (n - 2) terms on the right-hand 
aide of (23) is equal to 

(26) \Ft(x, *)!’ [W - Ftlx, x)!V(F.(x)r S U “ Wl 

Combining (23), (24), (25), and (26) we get the desired result in Theorem 
irr, i.e., 

0?[x) - (Fife xT'WfcT* 

(27) + 2[F,fe x)r * IW - Ft fx, *)\WW}' * 

+ (ft - 2)m,x)r l |F,(x) - Fj(x, X)| V[F,(x))-‘ *a - F,(*)). 


In a similar way one can prove Theorem IV. 

Theorem IV. C/nder ammption A, the distribution of second malket mines in 
samples of size n, n > 2, is given by the d.f. IlT(x), 


//%) - 1 - U “ 2 W + ftk,j)r* 

H w U - Fi(x)|"^ 


( 28 ) 


„[l - 27 f1 i(x) -b F*(x,x)] s '* , , ... .. 

2i - (F,(x) - F»(x,x)j 


(ft-2) 


[1 ~ F,(x))^ 

[1 - 2F,(s ) + F,(*, x)] ,, s jf,{x) - FiUjr)* 1 

(1 - Fifa ))*- 1 W*T" 


REFERENCES 

[1] J. L, Doob, "The brownian movement and Htoehastic equations," Anna/« of Mathe¬ 

matics, Vol. 43 (1942), pp, 351. 

[2] M. 0. Wang and G. E. Uhoenbeck, “On the theory of the brownian motion II," Itmmes 

of Modem Physios, Vol. 17 (1915), p. 323. 



NOTES 


This section is devoted to brief research and expository articles and other short item. 


note on the consistency of the maximum likelihood 

ESTIMATE 1 
By Abbaeam Wald 
Columbia University 

I* Introduction. The problem, of consistency of the maximum likelihood 
estimate* has been treated in the literature by several authors (see, for example 
Doob [if and (’ranter [2] 8 ). The purpose of this note is to give another pi oof of the 
consistency of the maximum likelihood estimate which maybe of interest because 
of its relative, simplicity and because of the easy verifiability of the underlying 
assumptions. The present proof has some common features with that given by 
Dooh, insofar that, both proofs make no differentiability assumptions (thus, not 
even tho existence of the likelihood equation is postulated) and both are based 
on the strong law of large numbers and an inequality involving the log of a 
random variable. The assumptions in the present note are stronger in some 
respects than those made by Doob, hut also the results obtained here are stronger. 
For the sake of simplicity, tho author did not attempt to give the most general 
results or to weaken the underlying assumptions as much as possible. Remarks 
on possible generalizations are made in Section 4. 

Let Xi , Xi , > • *, etc, be independently and identically distributed chance 
variables. The most frequently considered case in the literature is that where 
the common distribution is known, except for the values of a finite number of 

1 Tho author wishes to thank J, L. Doob for several comments and suggestions he made 
in connection with this note. 

1 According to a communication from Doob, his Theorem 4 is incorrect, but is correct if 
tho claw of almost everywhere continuous functions in that theorem is replaced by a suitable 
class C of functions. The class C can be any one of a variety of classes; for example, the class 
of bounded almost everywhere continuous functions, or the larger class of almost every¬ 
where continuous functions each of which is less than or equal in modulus to any one of a 
proscribed sequence of functions with finite expectations. His Theorem 6 on the consistency 
of the maximum likelihood is then dependent on tho class Q used in Theorem4, 

"The proof given by Cramdr (2), pp. 600-504, establishes the consistency of some root 
of tho likelihood equation but not necessarily that of the maximum likelihood estimate 
when the likelihood equation has several roots, Recently, Huzurbazar [3] showed that 
under certain regularity conditions the likelihood equation has at most one consistent 
solution and that the likelihood function has a relative maximum for such a solution. 
Since there may be several solutions for which the likelihood function has relative maxima, 
Cramdr'a and Huaurhasar'a results taken together still do not imply that a solution of the 
likelihood equation which makes tho likelihood function an absolute maximum is necessarily 
consistent. 


69S 



ABRAHAM WAUD 


596 

parameters, 0 1 , tf 3 , • • • , t. In this note we shall treat the parametric case. For 
any parameter point 9 ** (0\ - * • , 0 1 ), let Fix, 9) denote the correaponding 
cumulative distribution function of A r , ; i,e,, F(x, B) <=* prob, [A< < x \. The 
totality 0 of all possible parameter points is called the parameter space. Thus, 
the parameter space fl is a subset of the A-dimenaional Cartesian space. 

It is assumed in this note that for any 6, the cumulative distribution function 
F(x, 9 ) admits an elementary probability law /(x, 0). If F(x, 0) is absolutely 
continuous, f(x, 6) denotes the density at x. If Fix, 0) is discrete, /(x, 9) is equal 
to tlie probability that X, «* x. 

Throughout, this note the following assumptions will be made. 

Assumption 1. F{x, 0) is either discrete for all 8 or is absolutely continuous 
for all 9. 

Before formulating the next assumption, wo shall introduce the following 
notations: for any 9 and for any positive value p lct/(x, 9, p) be the supremum of 
f(x, O') with respect to O' when j | < p. For any positive r, let <p(x, r) 

bo the supremum of /(x, 8 ) with respect to 0 when \ 0\> r. Furthermore, let 
f*[x, 8, p) * f(x, 9, p) when f(x, 6, p) > 1, and **1 otherwise. Similarly, let 
<P*(x, r ) «> ip(x, r) when <p(x, r) > i, and *=>1 otherwise. 

Assumption 2. For sufficiently small p and for sufficiently larger r the expected 

f OO A 

log f*(x, 8 , p) dF(x, Of) and / log <p*(x, r ) dF(x, Of) are finite where 

DO V - Off 

Oo denotes the true parameter point / 

Assumption 3. If lim 9/ -• 9, then llrn/(x, 0 ( ) » /(x, 8) for all x except perhaps 

(—M !«• 

on a set which may depend on the limit point 0 (but not on the. sequence 6<) and 
whose probability measure is zero according to the probability distribution corre¬ 
sponding to the true parameter point 6 t . 

Assumption 4. If 6 l is a parameter point different from the true parameter point 
0o, then F(x, 0i) j* 5 F(x, Of) for at least one value of x. 

Assumption 5. If lim j 0< | => «, then lim/(x, Of) ** 0 for any x except perhaps 

on a fixed set (independent of the sequence 0<) whose probability is zero according 
to the true parameter point 0 a . 

Assumption 6. For the true parameter point 9s we have 

( ] log f(x, Of) J dF(x, 9f) < «. 

Assumption 7. The parameter space Q is a dosed subset of the k-dimensional 
Cartesian space. 

Assumption 8. f ( x , 9 , p ) is a measurable function of x for any 6 and p . 

It is of interest to note that if we forbid the dependence of the exceptional set 
on 0 in Assumption 3, Assumption 8 is a consequence of Assumption 3, as can 
easily be verified. 

4 The measurability of the functions /'(z, 9, p ) and r) for any 6, p and r follows 
easily from Assumption 8. 



M VXIMTJM LIKELIHOOD ESTIMATE 


597 


In the discrete case, Assumption 8 is unnecessary. In fact, we may replace 
f(x, 0, p) everywhere by f(x, 0, p) where }{x, 9, p) = f(x, 6, p) when f(x, 0 O ) > 0, 
and fix, 0, p) = 1 when /( x, da) = 0 Here 6a denotes the true parameter point. 
Since fix, 6a) > 0 only for countably many values of x,f{x, 6, p) is obviously a 
measurable function of a;. 

In the absolutely continuous case, F(x, 9) does not determine/(x, 0) uniquely. 
If Assumptions 3, 5 and 8 hold for one choice of /( x, 9), they do not necessarily 
hold for another choice of /( x, 6). This is in a way undesirable, but assumptions 
of such nature are unavoidable if we want to insure the consistency of the 
maximum likelihood estimate. It is, however, possible to formulate assumptions 
which remain valid for all possible choices of f(x, 9) and which insure the con¬ 
sistency of the maximum likelihood estimate for a particular choice of fix, 8) 
In this connection the following remark due to Doob is of interest. Let Assump¬ 
tions 3' and 5' be the same as 3 and 5, respectively, except that the exceptional 
set is permitted to depend on the sequence 0,. If 3' and 5' hold for one choice of 
f(x, 0), they also hold for any other choice. Doob has shown that Assumptions 3' 
and 5''insure the existence of a choice of/(a;, 0) for which Assumptions 3, 5 and 8 
hold. Thus, one may say that Assumptions 3' and 5' are the essential ones and 
the stronger assumptions 3, 5 and 8 arc needed merely to exclude a “bad’' 
choice of fix, 0). 


2. Some lemmas. In this section we shall prove some lemmas which will bo 
used in the next, section to obtain the main theorems. Let 0 O be the true parameter 
point. By the expected value Eu of any chance variable u we shall mean the 
expected value determined under the assumption that 0» is the true parameter 
point For any chance variable u, v! will denote the chance variable which is 
equal to u when u > 0 and equal to zero otherwise. Similarly, for any chance 
variable u, the symbol u" will be used to denote the chance variable which is 
equal to u when u < 0 and equal to zero otherwise. We shall say that the expected 
value of u exists if Eu' < °°. If the expected value of u' is finite hut that of u 
is not, we shall say that the expected value of u iB equal to - • ■ 

Lemma 1. For any 9 9 ^ 9a we have 

( 1 ) E log/(X, 9) < E log f{X, 6a) 

where X is a chance variable with the distribution F(x, 9a). 

Proof. It follows from Assumption 2 that the expected values in (1) exist. 

Because of Assumption 6, we have 


( 2 ) E | log f(X, 0o) | < 00 ■ 

If E log f(X, 0) = — 00 , Lemma 1 obviously holds. Thus, we 
the case when E log f(X, 6) > — 00 ■ Then 


shall merely consider 


( 3 ) 

Let u = log f(X, 0) 


E\logf(X, 0 ) | < "• 

- log f(X, 9a). 6 Clearly, E\u \ < •». It 1 


known that for 



ABRAHAM WAJ.I> 


fiOB 

any chance variable u which ia not equal to a constant (with probability one) 
and for which B\u \ < », we have 6 

(4) Eh < log EX. 

Since in our case 

(5) EX 5 1, 

and since u differs from aero on a set of positive probability (due to .Assumption 
4), we obtain from (4) 

(6) Eh < 0. 

Thus, Lemma 1 is proved. 

We shall now prove the following lemma. 

Lemma 2. Km E log f[X, 9, p) - E log/(X, 9). 

/ r -*0 

Proof. Let/*(x, 9, p) » /(jc, 9 , p) when fix, 9, p) t 1, and ®1 otherwise. 
Similarly, let/*(x, B) « /(x, 0) when /(x, 9) > 1, and otherwise. It follows 
from Assumption 3 that 

(7) Iim iog/%r, tf, p) » iug/*(x,03 

except perhaps on a act whose probability measure is zero. Since log f*(x, 9 , p ) 
is an increasing function of p, it follows from (7) and Assumption 2 that 

(8) lim E log/%Y ( 0, p) ~ E log /*{.Y, 0). 

0, p) « f(x, 0 t p) when fix, d t p) % 1, and «i otherwise. Similarly, lot 
9) ■* f(x, 0 ) when f(x, 9) <J t, and »1 otherwise. Clearly, 

(9) I log/•*(*, Ml S | Iok/**(■**. *)1 

and 

(10) lim log/••(*, 0, p) » log /**(/, 0) 

p““Q 

for all x except perhaps on a set whose probability measure is zero. The relation 
(ID lim E log 9, p) - E log f*{X, 9) 

f-0 

follows from (9) and (10) in both cases, when E log f**(X, 9) ia finite and when 
E logf**(X, 9) «* — «), Lemma 2 is an immediate consequence of (8) and (11). 
Lkmma 3, The equation 

(12) lim E log <p(X, r) — to. 

holds, 


‘ It ia of no consequence what value is assigned to u when f(x, 9) or/fx, M is zero, since 
the probability of such an event, because of (3), is zero, 

“This is a generalization of the inequality between geometric and arithmetic means, 
See, for example, Hahdy, Little wood, Polya, Inequalities, Cambridge 1031, p. 137, The¬ 
orem 184, 



MAXIMUM LIKELIHOOD ESTIMATE 


599 


Proof. It follows from Assumption 5 that 

(13) lim, log <p{x, r) = - oo 

r**>oo 

for any x (except perhaps on a set of probability 0). Since according to Assump¬ 
tion 2, 

(14) 75 log v*(A, r) < oo, 

anti since log <p(-L r) - log w*(x, r) and log <p*(x, r) are decreasing functions of 
r, Lemma 3 follows easily from (13). 


3. The main theorems. We shall now prove the following theorems. 
Theorem. 1. Lei oj be any closed subset of ike parameter space 0 which does not 
contain the true parameter point 0 O . Then 


(15) 


prob. 

* n**« 


Hup fix,, 0)/(X, ,6) f(X n , 9) 

7fc7flo)/(AvTeo) •••/(x n ,fi.) 



= l. 


Proof. Let r 0 be a positive number chosen such that 


(16) L' log v>(A r , r 0 ) < A log /(A, 0 O ). 

The existence of such a positive number follows from Lemma 3. Let u, be the 
subset of w consisting of all points 0 of w for which | 0 | g r 0 . With each point 6 
in wi we associate a positive value p« such that 

(17) 75 log f(,X, 8, p») < E log fiX, 0„). 

The existence of such a p« follows from Iximmaa 1 and 2. Since the set us, is 
compact, there exists a finite number of points 8,, ■ ■ , d h in on such that 
S(0,, ps,) +•••-(- iS '<{8 h , p jA ) contains oh as a subset. Here S(8, p) denotes the 
sphere with center 8 and radius p. Clearly, 

h 

0 Sup fix i ,0) * * • fix* i 8) ^ ]C fi x i > ) pi) ''' /(*» i > P*») 

+ ¥>(xi, r 0 ) • • • ip(x n , ro). 


Hence, Theorem 1 is proved if 


(18) 



fiX, , 0,, ps,) 
f(Xi", 0 t ) 


wo can show that 

• * ■ /PC,, pa*) 

'•••/(a;,0o) 



1, ■ * • »7r) 


and 

(19) 


■{ 


y>(Xj, 1 *p) • • ' y(Xn 1 rp) _ 


prob \i™ JCx^Jo) • • • fix n , 0 O ) 


= 0> = 1. 



ABRAHAM WA14» 


The above equation;- ran fee written an 

f « ^ 

rw , prob (lim - h^fCX„.BJ\ >■-- --*»}■ •« I 

(*’ ET !»■’*, h) 
and 

(21) prob Ilim £ flog *r(»V„, i%( — log/(A*,, *■ — «!■ ro I. 

«r-*^I j 

These equations follow immediately from (1(5), II?) and the strong law of large 
numbers. This completes the proof of Theorem 1. 

Theorem 2. Lot 3„(xi , ■•*,£„) be a function of Ihr otmcmtH&m xt , 
such that 

(22) ~ J> r, > 0 for ait n ami fur all jr,, • • • , x„. 

f(X i> Bn) * ■ * f{X„ , 

Tim 

(23) prob {lim « 0«] «* L 


Proof. It is sufficient to prove that for any t > (1 the probability is one that all 
limit points 6 of the sequence {#*{ satisfy the inequality {9 — % \ & «. The 
event that there exists ft limit point § of the sequence \0„ ( such that 1 0 - tk | > t 
implies that Sup/(ri , 0) • • • f(x n , 0) & fix i, $*) • • • f{x„ , h) for infinitely 

many n. But then 


(24) 


Sup fixi , 0) 

(iff, 


/(x„, 6) 


fl*«, 0t>) 


S c > 0 


for infinitely many n. Since, according to Theorem 1, this is an event with 
probability zero, we have shown that the probability is one that all limit points 
0 of {£„) satisfy the inequality ] 6 — | g *, This completes the proof of 

Theorem 2. 

Since a maximum likelihood estimate 0„Cri, • ■ * , x»), if it exists, obviously 
satisfies (22) with c ** 1, Theorem 2 establishes the consistency of 0„(xi ,***,«#) 
as an estimate of 9, 


4. Remarks on possible generalizations. The method given in this note can be 
extended to establish the consistency of the maximum likelihood estimates for 
certain types of dependent chance variables for which the strong law of large 
numbers remains valid. 

The assumption that the parameter space ft is a subset of a finite dimensional 
Cartesian space is unnecessarily restrictive. Let ft be any abstract space. All of 



ON wald’s proof of consistency 


601 


our results can easily be shown to remain valid if Assumptions 3, 5 and 7 are 
replaced by the following one: ’ 

Assumption 9. Il is possible to introduce a distance 8(8,, 8f) in the space Q such 
that the following jour conditions hold: 

(i) The distance 5(0i, 0 4 ) makes Silo a metric space 

(**) fo) “ f( x , 0) if lim 6, = 8 for any x except perhaps on a set which 

may depend on 6 (but not on the sequence 0<) and whose probability measure is zero 
according to the probability distribution corresponding to the true parameter point 0 „. 
(hi) If On ie ct fixed point in il and lim 6(8, , — °° , then lim f(x. d t ) — 0 

InOQ 

for any x, 

(iv) Any closed and hounded subset of 0 is compact. 

REFERENCES 

[1] J. L, Doob, "Probability and statistics,” Trans Amer Math. Soc. y Vol. 36 (1934). 

[2] H, Cram&k, Mathematical Methods of Statistics, Princeton University Press, Prmoeton, 

1640. 

[3] V. S. Huzurbazar, “The likelihood equation, consistency and the maxima o{ the likeli¬ 

hood function,” Annals of Eugenics, Vol. 14 (1948). 


ON WALD'S PROOF OF THE CONSISTENCY OF THE MAXIMUM 
LIKELIHOOD ESTIMATE 

By J. Wolfowitz 
Columbia University 

This note is written by way of comment on the pretty and ingenious proof of 
the consistency of the maximum likelihood estimate which is due to Wald and is 
printed in the present issue of the Annals. The notation of this paper of Wald’s 
will henceforth be assumed unless the contrary is specified. 

The consistency of the maximum likelihood estimate is a “weak” rather than 
a “strong” property, in the technical meaning which these words have in the 
theory of probability, i.e., it is a property of distribution functions rather than of 
infinite sequences of observations. Prof. Wald actually proves strong convergence, 
which is more than consistency, His proof uses the strong law of large numbers, 
and he remarks that his method “can be extended to establish consistency of the 
maximum likelihood estimates for certain types of dependent chance variables 
for which the strong law of large numbers remains valid.” Below we shall use 
Wald’s lemmas to give a proof of consistency which employs only the weak law 
of large numbers. Not only does this proof have the advantage of being expedi¬ 
tious, but it can be extended to a larger class of dependent chance variables.. 

The consistency of the maximum likelihood estimate follows from the following 

Theorem. Let y and e be given, arbitrarily small, positive numbers. Let S(9 0 , y) 
be the open sphere with center do and radius t/, and let Cl(y) = 9 — S(8 o, y)- Lei 



«12 


jf 


li’aM** Amumpfitifflin 1 HWf Three niffr o », h < 1, and another 

pmitm number Xiy, t j wi my n > X'n, t \ 



fit ft if! 


Max. 


II / ' A *, «M 

i 


> tf" ’ * f 


utkn IK-, m Ike pre&alnldy *>/ the r«ht»m »n hwe* Infix, %), 

Pnwjr: Prsrml mrtly m m tint fm*4 *4 Wald's Tltimmn J mA obtain r, 
m * •• ■ i ju % ,«o that the art ihewin* sum t4 Aw <ij<n tph^rrt Si$, , ft) ) t j m j 
2, • ■•, h, cover# the ramptu t wt which »* thr- tntriwin^t of q^j with the 
sphere 8 j< r a , I Wine JP(*,h t ** I. • . A -► !. a* MIwm; 


"27(9,1 «* K to*/ t.Y, #», p»a F bftf,Y, A,) 

(f » 1, * * ■ , /?) 


“'iniHil ** F fji *- Kh»|t/(X^) 

If any of the right rromlien* abme arc Ih ft*,# t#» nor, M y„ Thus all 

7 (0i) are powUive, Applying the weak law A tog<« tminWr# wr* have that, for any 
t auch that 1 < t < A -f-1,thereexist*ajK«oinrmimlrr A\ Mich that, when a > 
N h 

Il/fA\,d, ( P» t l f 

f*t | „ ■■ ■ > exp (- «TOUi; > j. * 

II/a-,.« , h+ ' 

\ l 

a “ i,»“, k) 

[ II f{X t , rj j 

!> * ^ ' ■ > exp ( ~nT(8^,l)) > . . 

II/(*„*< ' ,+ 1 

l , 

From this the theorem follows immediately, with 

X(y, *) >- max AS 
« 

h(t)) « max exp I-T0M1. 

i 

The author is obliged to Prof, Wald for his kmdnew in making hi# paper 
available to the author. 



RANDOM WALK 


603 


A NOTE ON RANDOM WALK 
By Hkruert T. David 

fhr Johns Hopkins University Institute, for Cooperative Research 

A random walk is defined as a series of discrete steps along the real line, here 
denoted by L Kaeh step is represented by the chance variable X, with sectionally 
continuous density function /(/)■ The walk begins at any point a of /, and 
continues until a step carries us outside some, subregion ft of I. In this note, ft is 
taken iu» a finite interval with upper bound D and lower bound D - y. The 
chance variables .V and Z are, respectively, the number of steps required to end 
the walk, and the endpoint of the walk. The range of Z always excludes 0 
Below' we define /-/>-* a, ami consider E{N) as a function G{x, y) of x 
and y Under specified conditions, a differential equation (32) is derived, relating 

G(0, y) «ud 

Let 


(1) 

*i(0 - /(( - a) 

MO ^ f 0> ~ 1) / fl/W 

(2) 

/ ft - a - L ll 'M ■ • • %-i; 

where 

j"« 4 Z a»"| <a , for 1» 2, • • •,« - 

Then 

/'!Z «ici,jY » n] * [ MO dt 

»W| 


■> n) “ 0 

Hence 

joj,V « n\ ** f MO dl 

(3) 

Ki. Y-Oi if MO dt. 


n > 1 


for 

for wn ft. 


The transformation |A, «4~ Ei*i 9i **; U 

the more convenient expression 


n ~ 1] gives for M) 


!• 

\lfM) »« 

Jit 


in — I) 


[/(ft, - <*) 

st**S 

* U M 


hi-iW - k-0 dhl "' dhn ~ l ■ 


(4) 


EOl 



no-i 


*UKrir,KJ t, mvtn 


Thr n-fntd integral j $Jit dt w atmluteh* convergent, hence may bo Jntc- 
grnted first with respect to t. This gives, keeping the notation of /•!) 

(5) f 4*[l) dt * f $n i(A» \> dh* i. 

Jf 

Assuming that EG\ \ remains futile for all considered a and ft, series (3) may be 

«S< * 

rearranged, giving : E{N) ** £ E, when* 

i *■*! 



Now, fh - £ PIN SLeS 1 1 « 1. AI*o, tiring ih) and induction on n, it is readily 

shown that fh ** ( ill t so that 

Jq 

(0) E{N) = l + f; 

**•1 

Define transformations T* : (y, « /> - A,, i : 1, * ■ • , n — 1; y„ » l) - l). 

Substituting expressions (1) and (4) in (0), transform the jfth term of the aura* 

mation by fj . This gives 

(7) E(N) - 1+ T, I •••(«)•** ( fix - qs) flfig, - gw) dg t *•• dy, 
where x » D — a. 

By (7), E(N) is a function of x and y, hence we write B(N) « (Hx, y). 
Define: 

M(k) : Max/(f) for jt| < fc. 

K : Any number satisfying K < [1 - 
R : Any region [— ae <x< «; 0 < y < K). 

M : Max/(<). 

L : Any number satisfying L <.{ 1 — t]/M. 

R' : Any region [— *> < * < ®; 0 < y < A|. 

In the ensuing argument, we shall assume that 

( 8 ) (*, V) * R, 

This condition restricts certain one-dimensional and two-dimensional variables 
to regions over which some infinite series are uniformly convergent with respect 
to these variables, Uniform convergence is required to validate term-by-term 
differentiations and integrations, and to establish the continuity in one or two 
variables of certain functions represented by series. 

Arguments dealing with the solution of integral equations (17), (20) and (25) 
are valid only under the more restrictive condition 




RANDOM WALK 


605 


(9) (*, y) « R' 

this being the general sufficiency condition for the existence of solutions. How¬ 
ever, (17) and (20) enter the argument with respect only to the derivation of 
equation (21) which could have, been derived, though in a more cumbersome 
manner, by a term by term comparison of the series expressions for [X^s, y)] 
[6(y, ?/)l and for [Gui(x, ?/)] (X(l/> ?/)], this latter approach being valid under (8) 
Similarly, (2-5) is used only in obtaining (27), which could have been obtained 
by a direct manipulation of the series expression for 6{x, y), this approach also 
being valid under (H). Hence, alt subsequent derivations hold, as long as (x,y) el R 
By (8), we may inierchange summation and integration with respect to gi 
in (7). This gives 

( 10 ) (fix, y) “ l + J 9 fix - a)G(g, v) dg. 


Assume that/(f) has a continuous derivative everywhere 
Then/(f) is continuous and l7(x } y) is continuous by (7) and (8). Hence 
(12) /(x - g)Ghh V ) arl(1 d/dx ft* ” 0)G(g, y) are continuous in ( x, g) 
f(x - g)(Hg, y) is continuous in (g, y). 

Let GiAx, y) denote 

f (,«*.«)■ 

dx: dy’ 

Then by (12), we may differentiate (10) with respect to x, and, since 
/lo(l _ ’ g ) » ~/«»(x - &i, integration by parts yields 

(14) Quiz, y) ■ /(x)b'(U, y) - fir - yMy, v) + [ fix - i v) d 0- 

Further under (8), (m(x, y) may be obtained by differentiating (7) term by 
and we may differentiate (10) with respect to y, giving 

(15) (mix, y) » fix - y)G{y, y) T fix ~ oX’^ig, v) dg. 

Adding (14) to (15), dividing by f/(0, y) which is always greater or equal to 1, 
and letting 

(10 ) X(x, y) « IffwCr. V) + f «>(x, y)l/<?(0, y) 


we obtain 

(17) X(x, y) fix) + jf fix - pMff* V) ty’ 



006 


HERBERT T, DAVID 


Under (9), (17) defines a. function 

r* 


(18) 


X(x, y) ■=* f(x) + Y 

n*| 


(») 




gt) 




ri/Cffi “ gui)f(g„) fyi dp. 




By (8), this function is continuous in (x, y) and may he differentiated term by 
term with respect to y. Further, X«i(x, y) thus gotten is continuous in (x, y), so 
that f(x ~ y%i(g, y) is continuous in (g, y). Hence, (17) may be differentiated 
with respect to y, giving 

(19) Xei(x, y) « fix ~ y)\iy, y) + jf fix ~ g)\nig, y) dg. 

Since, under (9), the integral equation 

(20) «(x, y) =■ fix — y) + jf fix - g)a(g, y) dg 

has a unique continuous solution for every fixed y, (16) and (19) give 


( 21 ) 

Hence 

and 

( 22 ) 

(23) 


X«i(x, y) m (hiix, y) 

Hv, y) (Kv, V) ' 

jf Wx, y) clx jf 6' 0 |(x, ?/) dx 

ky, v ) (Kv, y) 


d 
cly Jt 


I X(x, y) dx y I ( ’( x > y) (ix 
\ ___ m dy Jo _____ 

kv, y) G(y,v) ~ 


Let/(t) » f(-l). 

Then it is obvious from the definition that 
(24) (7(0, y) - G(y, y). 

Further, by (15), 

Goi(X> V) ~ fix - v) + f fix - g) dg 


(25) 


(Kv, v ) 


so that, under (9), (26) gives for (mix, y)/Giv, y ) the unique expression 

" fV pV ll—l 

fi* ~y)+Y in) • - • / fix - gO n/(?i - g>+i)f(g» — v) 

n-i Jq J o i-i 

which, by (23), is equal to 



RANDOM WALK 


607 


« rll fU n-1 

f(y - x) + E I •••(«.)••■[ /(?/ — 0 n ) XI/(s.+i - 9i)}(gi -x)dg 1 “- dg n . 

n—'1 «/Q «/Q l— 1 , 


to X, it follows that 


F GW#, y) _ 

Jo G(y, y) ■ 

(26) 

rV « 

/ f(y - a:) dx + Z / 

Jo n-1 JO 

• f f(V - On) f[/((7.+l 

Jo 1-1 


1) 


, dx 


which, by a change of integration indices and a referral to (7), is seen to equal 
[G(y, y) - 1]. (26) thus gives 


(27) 


[ G<n(x, y) dx = G(y, y)[G(y, y) - 1]. 
Jo 


Further, by (16), (24), and (27), 


(28) 

f A(z, y) dx = G(0, y) — 1 
Jo 

so that 


(29) 

~ f X(i, y) dx = ~ G( 0, y) 
dy Jo dy 

while (24) and (27) also yield 

(30) 

4 f G(x, r/) dx = [G(0, y)]\ 
dy Jo 


Hence, by (22), (29), and (30), 


(31) 


\{y, y) = v G(0, y)/G{ 0, y). 

dv 


Finally, substituting (31) in (21), and remembering the definition of X given in 
(16), we get, using (24), 

(32) G(0, y)[Gu(x, y ) + <?„,(*, y)\ = j y G{ 0, v)[<Mx, y) + 2<Ux, 2 /)]. 

The conditions under which (32) holds are, in summary, (8), (11), and (23). 


If f(l) has an expansion 

(33) 

it is clear from (7) that 

(34) 


M = T l A l t'-, M< r 

i—0 


G(x,y) = Z B ti x'y’ 

1,3—0 


for (x, y) « 5, where S : [T„ < * < ft; 0 < V < Ti + T & T » - ^ < T ' 



ROJlSOlT K l.RMNWIKItl 


MIS 

Substituting 04) in <32), awl equating of like powers of (x, y\ 

we obtain the recursion formula© 

(35) T, /f„/0bj[2fr-j-t li- r + ilb - JfcJ; i;0, 1,.... 

fHhk*** | g 

From (10), it is readily verified that H# 0 fur i *s 0, m that equations (35) 
give solutions for the B,, in terms of the ft ...». These solutions are of interest 
since they show a one-to-one eomjsponrlrnen l*etw«>n the fund ions 0(0, y) 
and (l(x, y ), for (x, y) t JJ? fi A'|. 


NUMERICAL INTEGRATION FOR LINEAR SUMS OF EXPONENTIAL 

FUNCTIONS 


By Robert K. OnKKXworm 


The Unict mly of Trias and the ImlUule for Kumrriml Analytic 

1, Introduction. The methods of numerical integration going by the names 
trapezoidal rule, Simpson's rule, Weddles rule, and the Newton-Coles formulae 
are of the type 

a I n 

(1) / /(x) dx ■$« X, n /{x ( ,i 

where the abaeime !x,.,| are uniformly distributed on a finite interval, chosen 
as (-1,1) for convenience, 


( 2 ) 


x,„ * —1 + 


2* 


f ** 0, 1, 2, 


, n, 


and where the set of Constanta 0in I depend cm the name of the rule and the value 
of n but not on the function f(x). Throughout this note all abscissae will be 
assumed to he uniformly distributed on (-1, I) unless the contrary is explicitly 
stated. 

Since correspondence relation (1) involves (n + 1) constants {XJ, it might 
be possible to choose (» + 1) arbitrary functions g/x), j « 0, 1, 2, ■ ■ ■ , n, 
and require that the set (A,,,} be the solution, if such exists, of the (n + 1) 
simultaneous linear equations 


ft 

(3) I - £ A (b g, (*„), 

*A-1 

Indeed, the selection 

(4) gfx) x\ 


j •» 0, 1, 2, ■ ■ •, tu 


j m 0,1, 2, >• ■ , n, 


will give a set of (n + 1) simultaneous equations of form (3) and the solution {X m ) 
i s the set of Newton -Cotes weights for that value of n. The numerical evaluation 

1 This work was performed with the financial support of the Office of Naval Research of 
the Navy Department. 



NUMERICAL INTEGRATION 


609 


of (Xm) is best accomplished by other and more sophisticated methods how¬ 
ever. 

Because of linearity in both the integral and the finite summation, once the 
constants {Xm} have been determined for a specific set of functions { , r ix ) | 
correspondence relation (1) is exact for any linear combination of that funda¬ 
mental set. Thus, for example, for the fundamental set (4), correspondence 
relation (1) with the appropriate values {X in } is exact for all polynomials of 
degree less than or equal to n. 

Although tradition favors the set of functions (4), there is nothing compelling 
about such a selection. Indeed, two other possible choices might be 


(5) 

£b(z) = e ,x , 

3 = 0,1, 2, •• 

• , n 

and 




(0) 

g&) = 




j = — m, —m + 1, • ■ , 0, 1, • • , m — 1, m; n = 2m. 

These choices would seem to be appropriate whenever numerical methods are 
being applied to exponential growth curves or exponential decay curves. 


2. Use of the basic set g } [x) = e’ x . If integration relation (1) be made exact 
for the set \e ix \ , j = 0, 1 , • ■ • , n with evenly spaced x abscissae, the set (3) of 
(a + 1) simultaneous linear equations in the unknowns {X,„}, i = 0, 1, • • ,n 
is obtained. Call the solution ol this system (<*,„}, solution values for n = 1,2, 
3, 4, 5, 6 are tabulated below. 

For the symmetric case where integration relation (1) is made exact for 
[e yi } | j sb —m, —m + 1, ■ • • , m — 1, m\ n = 2m, a similar but different set of 
linear equations (3) results for the unknowns {X,«}. Call the solution of this 
system {b,„} • As implied above, only even values of n are used in order to preserve 
tile symmetry, and values of {1),„} are tabulated below for n = 2, 4, 6. 


n = 1, 

<*oi — 

1 31303 

5285 




flu = 

0 68696 

4715 



n ■> 2, 

Ooa = 

0 21805 

032 + 

boa — 0,32260 

623“ 

<*18 53 

1.49780 

742 

6 a2 = 1 35478 

755 


<*22 = 

0.28414 

226“ 

1)22 — 0.32260 

623“ 

n — 3, 

Ooa => 

0.51324 

284 



<*ia = 

0.22445 

055 




etna 

1 08155 

527 




Oaa = 

0.18075 

134 



n = 4, 

<*04 = 

<*H = 

-0.13716 

1.40098 

639 + 

548 

b 0 4 = 0.15048 
b lt = 0.73243 

171 

318 


1 Whittaker and Robinson, The Calculus of Observations, 4th Edition, (1946), London, 

pp. 162 - 166 , 



1510 ROBERT E. 0 KEEN WOOD 


a<M ■» 

- 0.3089.1 

914 

bn “ 0.23417 

022 


0.91710 

903 

h H « 0.73243 

318 

fj H sa 

0.12803 

103" 

b u - 0.15018 

171 

tkl ** 

0.58919 

3 



«14 ** 

— 1.07544 

3 



a n ** 

2.12534 

« 



rt» «* 

—0.53595 

0 



ttli »■ 

0,79933 

8 



an » 

0.09852 

18 



Ot>4 -■= 

-0.83007 


ht ^ 0.00443 

5 

flie “ 

3.54128 


5m » 0.53464 

7 

(ht “ 

-3.88102 


bn *- 0.01139 

3 

at 5 * 

3.32254 


5m 0.71005 

0 


-0.94685 


b u » 0.01130 

3 

a« *» 

0.72075 


5m « 0.53464 

7 

«« “ 

0.07937 

5* 

5m 0.09143 

5 


Tim computing; iwrvice of tho Institute (or Numrrirnl Analysis haa supplied the author 
with moat of the coefficient# tabulated above. 

3. Estimates of the error term. The choice* of the coefficients (a,,) and 
(6(„| are such that integration reflation (1) is exact whenever 

(7) fix) « /to + vlj.e* + + .4„e"' and *• a, n , 

and whenever 

(8) f(x) = B- m e "* 4- B- m +ie <w 1)1 4- * • > 4 Be 4- • • • 4" B m e mg and X,« *“ b, n . 

When/(a) is not of these prescribed forms, the error in using correspondence (1) 
may bo of some importance. By making the transformation 

(9) u = e r , f(x) =■ /(log u) » g(u) 
integration relation (1) becomes 

(10) [ giu)^ e* S Xf*Sf(u<») 

J 4 “i u 

where the (a<„ ) are not evenly distributed. By approximating y{u) by its Taylor’s 
series with a remainder term, tho following expressions for the error in using 
correspondence (1) can be obtained; 

Using tho coefficients {«;„), 

( 11 ) Error < ^rjjT [ 2 + 5 1 '] [-t£S, &T M . 

and, using the coefficients {&<„), 



NUMERICAL INTEGRATION 


611 


( 12 ) 


Error 


iy« l 


(2m 4- 1) 


/ p" ~ e~ m y I6,,»ml 1 

or L » 




Neither of these error expressions can, be said to be very practical in actual 
computation, and neither appears suitable for establishing convergence proper¬ 
ties of the type 

n a l 

lim 22 Xin/Cxi*) => f(z) dz. 


(13) 


4 SO 


•M 


However, both (11) and (12) reduce to zero when f(z) is of the form prescribed 
by (7) or (8) respectively. 

i. Numerical examples. As illustrative numerical examples, the case n = 4 
was selected and several typical functions were integrated approximately by the 
positive power exponential rule, the symmetrical exponential rule and the 
Newton-Cotes formula, 

j'fU) dx - *M7/(-~l) + 32/( — 1) + 12/(0) + 32/(|) + 7/(1)]. 

Values of (o«) and (M are given in the tables in part 2. The typical functions 
used were /, c u , l/(* + 3). . *«'. * # > and e '■ The followmE results were 

obtained: 


Function 


e u 

1/(J + 3) 
e ~** 
xe * 
s 8 

e %.u 


Paeilive Potent 

Exponential 


.5703 

3.6258 

.6828 

1.4930 

.7292 

.0270 

4.0527 


8827 

6044 

6353 

1396 

4338 

8487 

7287 


Symmetrical 

Exponential 

Newton-Coles 

8 Decimal 
Approximation to 
Exact Value 

.0671 8001 
3.0268 6041 
.6931 5792 
1.4857 2754 
.7353 6007 
.3238 5196 
4.0530 7585 

.6666 6666 
3.6317 3108 
.6931 7460 
1.4887 4582 
.7361 7480 
.3333 3332 
4.0607 7415 

.6666 6667 
3.6268 6041 
.6931 4718 
1.4936 4827 
.7357 5888 
.2857 1429 
4,0519 1379 


From thi. tabulation, it would 
method compares favorably vnth the NovTo 

functions an l/(x + 3), r-, a-, **, and N«t»i- 

is not really a fair choice when comparing t symmetrical exponen- 

Cotes is derived so as to give exactness for ** and the symmetn 

tial so as to give exactness for e 8 *. 



m2 


inrarn hikii 


SMOOTHEST APPROXIMATION FORMULAS 
By Ahthcr Sa an 1 * 3 * 

Qll'OVt Caitrgr 

Introduction. Consider & procew of approximation which operates on a 
function x « xU). The error in the process may he thought of as a sum H + M, 
where R is the error that would l*e present if x were exact and r>A is the error due 
to errors in x. (Precise definitions are given Ixdow.) Suppose that one wishes to 
choose one process A from a elaas Cf of procmw?. In some situations it is appro¬ 
priate to base the choice on R alone 8 ; in others it is appropriate to consider £>A. 

The primary purpose of the present note is to formulate a criterion of smoothest 
approximation: That A in it m smoothest which minimises the variance of 
SA. A criterion baaed on both R and 6A is also suggested. (Sections 1 and 2.) 
Smoothest approximate, integration formulas of one type are derived in Section 3. 

Progress in the, technique of estimating the covariance function of the errors 
in x will lead to further applications of the criterion of amootheat approximation, 

1. Approximation of a functional. Suppose that X* is a space of functions 
x « x(£) each of which is continuous on a & t £ b. I/ct /[.r] be a functional 
defined on X; that is, f[x\ is a real number defined for each x « X. For example, 
X might be the space of functions with second derivatives on t«, b] and J[x) might 
be x"(w), where u is a fixed number in [a, h). 

Suppose that /[x] is to be approximated by a Stieltjcs integral 

(1) A «■ f x(l) d<x(l)t £ « X, 

*a 

where a is a function of bounded variation, The remainder in the approximation 
of /(x] by A is 

R “ A — /[x]. 

If the approximation (1) operates on * 4* Sx instead of x, the result is A + SA « 
/ (x -f fa) da; and the error in the approximation of/[x] by A + 5A is R + 5A, 

J |«a 

where 

A 

(2) 5A « / ix(£) da(t)> 

Ju 

Consider a class Cf of approximations A, each of the fora (1), We shall propose 
a criterion for characterizing the “smoothest A” in Cf, relative to the covariance 
function of the errors Sx. 


1 The author gratefully acknowledges financial support received from the Office of Naval 

Research. 

3 "Best approximate integration formulas; best approximation formulas,” Amer, Jour, 

o/ Math., Vol. 71 (1949), pp, 80-91. 





APPROXIMATION FORMULAS 


613 


Assume that or ~ 6/hi i.« a stochastic process with mean zero’ and covariance 
function t rit, w) ” AifcrffJSxfu)]. Then, by (2), SA is a stochastic variable; and 4 


(3) 


ESA K f hr dn “ f 0 da — 0, 

A'CS.n 3 *-* v « f/r»(0 £ «x(u) da(w)J « £ £ a(t,u) da{t) dat{u). 


OritBRION- That .1 <?/ any) fnCf in smoothest which minimizes the variance 
vofiA, 

In particular <:i:<*.n this criterion {least squares) has been proposed and use,d 
by Chebyshcv anti other*. An appliealion to approximate integration is given in 
section 3 below. 

One may extent! this discussion to cases in which the approximations A 
involve derivatives of t. 

Remark. The criterion of beet approximation 5 may be combined with the 
above criterion of emmthrnt approximation as follows: That A (if any) in (3 is 
the best comprnmtM* which minimizes a specified combination of the variance 
of Sri and the modulus of H. Here it is assumed that the remainders R satisfy the 
conditions for the existence of the modulus.* 


2. Approximation of a function. One may extend the preceding discussion 
to the ease in which y * f{x\ is an operat ion to a space of functions y = y(u)- 
and in which the approximation of }[x] is 

.1 / x(t) di a(l, u), x t X, 

where, for each u, a is a function of bounded variation in l Then, for each u, 
Jri has a variance r(t<). Criterion. That A (if any) in a class of approximations is 
smoothest which minimizes e(u) for all tr, failing such an A, that A (if any) is 
smoothest which minimizes the integral of «(«), or alternatively, the supremum 
of r(it), over A Si k £5 5, 


3. Smoothest approximate integration formulas in a particular case. 6 Let m 
and n he fixed integers; m £ US 0- Let Cf = 0m .* be the class of a approxima 
firms of 

) The Mwniittl point hen* i« that MU ** m(0 bo known lor each (, for given m(0, one 

could and would replace * b Jr by x b *' ** m. i„„„™i nn8 nt E and fria 

4 We assume here that the integrals in (3) exist and t d . measure aa 

arc valid. For (hi# it i« sufficient that ix be Integrablo rela tye tto underlying 

for all functions u eurreapnnding to elements of CL where w fl ® “P obability in 

probability apace relative to which E is tho operator / da. Cf. • ■ °° > 

function space," Hull. Amer, Math. Soc., Vol. 53(HU7),e8pecui y r PP* nature that one 

5 The approximate integration formulas of this section a 

would expect them to be known. The values of J at the end are probably new 



614 


Afrrnrn itvno 


of (he farm 


/: 


X 1 *) ii' - 1\x) 


Wifi's 

A « ^ h,r(it. 


the m + 1 catvriantt 6, feto# ««*h /to A « /|/j whrnrwr /(i) m a polynomial of 
degree n. Throughout thin section i is to range over the m 4 J values i m -m/2, 
—«/2 + 1, ■ ■ • , + m/2. Suppose: that (he ttrare &x(t) are independent , un'/A 
common rarianrc «r\ and with mem sera. Then «/0 » a step function with jumps 
In at l * »; and 

o-o'Zh 2 ,. 


The smoothest approximation in tf„,, i« the one for which » is a minimum, 
(dim m 4* 1 variablfss In in r am subject to « 4' 1 constraints due to the condition 
that the approximation be exact for degree n. The wet (?„,* is empty if and only 
if m is leas than the largest even integer contained in n.J 
If n »> 0 or 1, the smoothest formula in Of*,.., in the one fur which all the 
coefficients arc equal; 

In m m/(m + 1) ; 


in which case 


n ®» mV/(m + I h 


If n ® 2 or 3, the smoothest formula in Cf„,„ is characterised by the following 
relations; 


b t « X« 4" t*X» i 

Xo *■ m(2m* + 9m — 6)/2(m — l)(m + l)(m + 3), 


Xi ■» — 3Qm/(m - l)(m 4- l)(m + 2)(m 4- 3); 

in which case 

v/o 1 » "kt)TH -j* Xim 1 /T2. 

Thus, the smoothest approximation in (.ini or in tf*,j is the following; 

A - }[*(-3) 4- a;(3)J 4- Hx(-2) 4- x(2)J 4~ Hl*(~D + *U>1 + I *(0). 

By the method of Lagrange's multipliers, one may establish the following 
relations for the smoothest formula in Cf*,». Here i baa the same range of values 
as before; g and v range over 0,1, * ■ • , [n/2], 

h - Z K **, 

P 

v/a : - ZXe,., 


approximation formulas 


615 


where 


r » “ m u+l ji\2n + 1 ), 
and V are determined by the equations 

L^Lt iW = c> . 

» < 

The class «*,* n*wb that for each A e QU there is a function kit) with the 
following properly. 


.m/S 

Il« A - f[x] m / X ln+] \l)k(L) dl, 

*~mli 


whenever x is a function with continuous (n + l)th derivative. The quantity 



is useful in appraising ft, since 

f*li 

tt’sjf t m Wn, 

J-m/2 


by Schwars's inequality. 

Values of J for the smoothest formulas are as follows. 

7i *** 0 » J m s /Q{m + 1). 

ft » 1 : J *» ?n s (3m J + 2m + l)/360(ro + 1). 

For ft » 2 and 3, and m S 6, the numerical values of J are as follows. 


m J 

{n - 2 ). 

2 1/1,890 

3 li/8,960 

4 134/»3,075 

5 1,865/150,528 

6 8/245 


J 

(n “ 3 ). 
1/9,072 
13/17,920 
02,539/13,891,500 
136,223/6,322,176 
6,683/82,320 


For the method of calculation of J, aa well as the transformation of J under a 
linear transformation of l, the reader may consult the paper 2 , 



johs r waiar 


OH THE POWER FUNCTION OF THE "BEST” /-TEST SOLUTION OF THE 
BEHRENS-FISHER PROBLEM 

By Jons’ K. Waw»h 

The Rand Cetrpcmtion 

1. Introduction. The BehrenaT'bbrr probbta b nutirnirf with Mgmfic&neft 
tmtu for the difference of the means of two normal populations when the ratio 
of the vstmnew of the papulations b unknown. I k not** one population by 
.Vi>n , al) and the other by SUh , *?), where the notation <V(a, e ! t represents a 
normal population with mean a and variance l.*'t m sample value* be drawn 
from (V(oj, c?) and n sample values from .V(<t», a‘) where m < n. Then HrhefftS 
(I| has shown that certain optimum properties are ptwwwl by a Most solution 
he proposed for the Behrena-Fbher problem, in which the numerator of / is bawl 
on the difference of the means of the samples while the denominator is based on 
the square root of a function of the sample values* which has a ^-distribution 
with m — 1 degrees of freedom. The purp<w of this note it* to compare the power 
function of this /-test with the pow er function of the corrrcponding most powerful 
test for the raw in whirls the ratio of variances cj V* is also known (only one¬ 
sided and symmetrical testa are conaidmxl). This comparison b made by com¬ 
puting the power efficiency (we section 2 for definition) of ridu ff^V teat. 

It is sufficient to limit power efficiency investigation* to one-aid wi teat*. A* 
shown in (2J, a symmetrical /-test with wgmfiatnet* level 2a has the mime power 
efficiency as the corresponding one-sided Heat with significance level a. Equation 
(2) of section 2 fummhes an explicit formula whereby approximate power effi¬ 
ciencies can be computed for a wide range of value* of a, m, n. Table 1 contain* 
values of (2) for a » .05, .01 and several values of m and n. 

For the situation considered here, a power efficiency of l()0r% has the quantita¬ 
tive interpretation that the given test based on samples of aiset m and n has 
approximately the same power function as the corresponding most powerful 
test based on sample* of s’uie rm and ra. Intuitively the power efficiency 
of a test measures the percentage of available information per observation 
which is utilised by that test. 

2. Power efficiency derivations. The basic notion of the power efficiency of 
a significance test is given in [2], For the present cum the problem i« to determine 
the value r such that a most powerful test of the same hypothesis (same sig¬ 
nificance level) baaed on m and m sample values will have approximately the 
same power function as the given (-test based on m and n sample values (from 
N(a>{ , ci) and N(at , <rj) respectively). Here the value of v)f a\ is assumed to be 
known. Then the power efficiency of the given (-test equals 100r%. 

If the ratio of variances a\h\ is known, the most powerful significance test 
(one-sided and symmetrical) for the difference of means of the two normal 
populations is a (-test where the numerator of i is based on the difference of the 



TABLE 1 

Percentage Power Efficiencies for Certain Values of m and n 

a = .05 



617 

































tmn k. WAt-»ii 


818 

two sample mratH while the ih-nranroatraru- hmw«I mi the wpirin* rrarf <4 a function 
of the sample value# and <» T e 5 which haa n x^hatrilrtilmn with w -f n - 2de- 
grm* of freedom Jl, p 135 Thus ibr* pmhhrn i» that of comparing tin* power 
funcliiaia of two Mr#U». 

As stated in wtion 1, it i« wirirsat Ut consider wm-imM twt#> We find, lining 
a modification of the normal approximatum to the power function of a raw-aided 
(-trad given in |3), that SrhufW's on** aided Mwl for the IMirm^-Pwher problem 
and the coitospondirig moat powerful one suded t««t <#!■*? known j have approxi¬ 
mately the same power function when r b rhow-n m that 

K„ - SVr\l ~ + «>r ~ 2)l m - K n - 6\l - Kl 2(m - 1))'«, 

where « i» the significance level rtf the testa, K„ u* f h»* value *4 the standardized 
normalized deviate exceeded with probability «, and it w a function of m, n, 
«i i (k , <rl . A and the given hypothetical value rtf orj ■ ■ «j leeing Iwtpd. This 
condition for the approximate equality of the power functions is reasonably 
accurate for the following caara: « •* ,(15, m > -1; n ** .(*28, m > 8; a » .01, 
to > fl; a «■ .005, m > 7. The accuracy of the approximation increases as m 
increases. 

Hence n value of r such that the two power functions are approximately equal 
ia determined by the equation 

(1) rf 1 ~ K*/2[(m + n)r - 2]j - 1 - kl/2(m ™ 1). 

Let 

/l <*» A (to, «)"»!- Kl/2(m — 1). 

Than solving (1) for the appropriate root yields 

' " 2<mV») B + <” + " W + «* 

+ Vl2 + (to + n)A + Kl/ 2j s - H(m + n)A }. 

Thus the power efficiency of Scheffd'a onesided t -test solution to the Behrens* 
Fisher problem, for the case in which the ratio of the variances, is also known, is 
approximately equal to 

(“q7~j {2-f (m + n)A +/ri/2+ V(2 +(to+ n)A + AV 2 l 1 ~ 8 (TO + n)A }% 
for suitable values of« and to, 

REFERENCES 

U) Hbnivy Sciikfc^, "On solutions of tha Behrens Fisher problem based on the f-distrlbu- 
tion," AnnahofMath. j3toi.,Vol. H (1043),pp.3&-A4. 

[2] John E, Walsh, "Some significance tests for the median which arc valid under very 

general conditions,” AnnaU o/Mcdh. Slot., Vol. 20 <1949), pp. 04-81. 

[3] N. L, Johnson and B, L. Wulcic, "Applications of the non-central Adiatribution,” 

Biomelnka, Vol. 31 (1940), p. 376. 


fiskek’k inequality 


619 


A NOTE ON FISHER'S INEQUALITY FOR BALANCED INCOMP,.FT* 

BLOCK DESIGNS INCOMPLETE 

By R. G. Robe 

InMitutf of Rfatntics, University of North Carolina, 

1. An experimental design in which o varieties or treatments are arranged in 
b blocks, is called a balanced incomplete block design ii B 

(i) Each block has exactly k treatments [k < v ) no treatment occurring twice 
in the Mime block. 

(ii) Each treatment occurs in exactly r blocks, 

(iii) Any two treatments occur together in exactly X blocks. 

It is easy to see that the parameters v, b, r, k, X of the design satisfy the rela¬ 
tions 


(LO) bk * cr 

(1.1) X(c - I) =* r(fc - 1). 

Also it is readily Keen that 


(1.2) r > X 

for otherwise with any given treatment every other treatment would occur in 
every block. This would make ft ™ v, and the design would become a ‘randomised 
block design*. 

Fisher (1040), showed that a necessary condition for the existence of a bal¬ 
anced incomplete block design with u treatments and b blocks is 

( 1 . 8 ) b&v. 

It is the object of this note to give a very simple proof of Fisher's inequality. 

2. Consider a balanced incomplete block design with parameters 

(2.0) i", b, r, k, X 

and let 

(2.1) n ( ;»lor0 

according as the tth treatment elms or does not occur in thejth block, Clearly 

(2.2) 23 » r 

b 

(2.3) 23 «</&<'/“ k 

i~i * 


(f i')- 



NS* * 


T \ 

X r 


where iV' denote the transpose of A\ 

(2.6) dot (AW') » \r + Xu* - 1)| (r - A)*“‘ 

But ■* kr(r — A)*'" -1 from (1.1). 

(2.7) (let (NN') « dot N del, N' « 0. 

Thin makes r » A, end contradicts (1.2). Hence the awurapticm b < vk wrong, 
and we muet have 

(2.8) b > p 

REFERENCES 

[1] R. A, Kuwait, “An examination of Uns different possible solutions of a problem in in¬ 

complete blocks," AnnaU o/ Eugenic*, London, Vol. 10 (1940), pp. 52-"5. 

[2] F. Yates, "Incomplete randomised blocks," Anwtf* of Eugenia, London, Vol. 7 (1936), 

pp. 121-140, 


ABSTRACTS OF PAPERS 

(Presented September /, 1049 at Boulder at the Twelfth Summer Meeting of the hutituk) 

1, Structure of Statistical Elements. Duawh M. Stum-ky, Foundation Research, 
Colorado Springs, Colorado. 

Research in logical semantics and in practical elementation lias set forth the proposition 
that all words and Ideas have set form. As a consequence of this universal proposition 
all notions and conceptions in statistics should be accessible to set-theoretic analysis and 
interpretation. This paper explains the results of a preliminary analysis performed on 
statistical notions and oonoeptions with a view to a proper organisation of definitions and 
conceptions which will, it is hoped, make possible a better and simpler construction of 
statistics from a system of basic notions. 



ABSTRACTS OF PAPERS 


621 


2. On the Relative Efficiencies of BAN Estimates. Leo Katz, Michigan State 

College, East Lansing, Michigan. 

J Neyman, in the Proceedings of the Berkeley Symposium on Mathematical Statistics 
and Probability, 1040, proved that x ! minimum estimates with either of two alternative 
definitions of x‘ are efficient, as also are the maximum likelihood estimates. He also raised 
the question whether some of these estimates wero better than othetB, This pap er bears 
on that question. In making x* minimum estimates, it is often necessary to avoid small 
frequencies by grouping together at least one tail of the distribution. It is with respect to 
the parameters of these modified distributions that the x’ estimates are efficient Define 
relative, efficiency in these circumstances as the ratio of the variance of an efficient estimator 
in the unmodified case to that of one in the modified case. It is shown that, except for a 
rectangular probability law, the relative efficiency <1 and, further, it decreases as the tail 
grouping is made wider. Formulae arc given for the relative efficiencies of x ! minimum esti¬ 
mators for Binomial and Poisson probability laws and some representative values com¬ 
puted to exhibit these effects. 


3. Adjustment of an Inverse Matrix Corresponding to Changes in the Elements 
of a Given Column or a Given. Row of the Original Matrix. Jack Sherman 
and Winifred J. Morrison, The Texas Company Research Laboratories, 
Beacon, New York. 

A simple computational procedure is derived for obtaining the dements b|, of a nth 
order matrix (M'l which is the inverse of (A'l, directly from tho dements 6,, of a matrix 
(ff) which is the inverse of (A), when (A 1 ) differs from (A) only in tho elements of one col¬ 
umn, say the Nth column. The equations which form tbo basis of tho computation are. 

o»i m -i i - 1, 2, •••«. 

/L b#r Urn 


blf ” Ini b!l, 52 LrOr.l , 

r«* l 


i “ 1, 2, 8 — 1, S + 1, • • ■ n 

j 1, 2, • * • n. 


Analogous equations are derived for the case that A and A' differ in the elements of a 
given row rather than a column. 

4. On the Problem of Optimum Classification. Paul G. Hobl, University of 
California at Ixjs Angeles. 


Let A , (f - 1, 2, ■■*,*)» hr the probability density function of population t and lot 
pi be tho’probability that population i will be sampled. Assume certain differentiability 
oonditiowi and mwnwut prap^rU*?#. Thpn, fur ktnmn parametara, tins probability of a cor¬ 
rect dawifimUion will maximiwi by fthw*ning the* nmnn Mi , wUoh corresponds to clas¬ 
sifying into population i, &t« that pjtH of wigihlt* where Pi/i S Ptf f » 0* * h * 4 * » )♦ 
If the parametm^ are unknowu, m nayiuptotlcidly optimum mi of mutiatca will be 
given by the set that minimires a certain form in the covnrianc.es. Among unoorrolated 
estimates, maximum likelihood estimates are seen to be asymptotically optimum. 

If weight functions, W , t , are introduced and the expected value of the loss ismimmiMd, 
the same methods of proof show that the region M, becomes that part of variable space 

where S p r /,(WV, - W ri ) > 0, (J » 1,2, •••,*),and that the criterion for an asymptoti¬ 
cally optimum set of estimates ia of the same form as the preceding criterion. 



stmt tvtf* »r paitr.* 


fl. Optimal Linear Prediction of Stochastic Processes whose Covariances are 
Green’s Functions. C L Dot.™ an<! M A Wononrav, ITniventity of Michi¬ 
gan, Ann Arbor, 

A method hS unhidsisl, mini mat variance, liwu pmhruoti is (WHopH {nr problems 
simitar So thru#* of prrdielion am) filtering tre sled by Wt«ii«r Is differ* from then* in that, 
She tinbiMrd rwftdilmn is ntnjtwM, only a fiml* pars of th# paws jh employed, and no *ta- 
licmary assumption »» used. It is aliown that t ho epcrial stationary «*&*«» disruam-d tiy Cun¬ 
ningham and Hum), "Itawdom Prorcew* in Problem# n f Air Warfare" iftupp, Journal Hmjal 
tftal, St w,, JfMtt! nnweda Imran®® lb# rorrelation function, «p|| known So that of 

she proceMt doKurd by th# Langeviari equation, i# 5hr t*r*-»V« function nf the homogeneous 
differential equation formed by letting she adjoins differential operator of th* I»Mig<ivian 
equation operate on tin* operator of tbta equation Thus relationship i» ah own to persist 
for any physically ntable linear differential equation driven by "whit# The well- 

known equivalence tw!worn integral and diffcrrniia.1 equation# ia then extended by use of 
Hlieltjes integral# and used to effort th# wnlutior*# of thr integral equation* of Ihr* first kind 
which yield Shu "optimum" Si nr nr prediction. Tb*> non*Mtion*ty example consisting of 
purely random motion about a mean linear path in She pnw#t» of radar type errors is 
treated in detail. 


G. The Integral of the Gaussian Distribution over the Area Bounded by an 
Ellipse. H. H, Gkrmond, RAM) Corpontinn, Bunin Monica, California. 

This paper dmriltes the preparation of table* from which to obtain the integral of a 
hi variate Gaussian distribution over th# area of ntt rliipw, Th** miter of the ellipse need 
not coincide with the mean of th# Oauwrfan distribution, nor twvd tin* axon of th# ellipse 
have any special orientation with respect to tin* Gaussian distribution. 

7. Theorems on Convergency of Compound Distributions with Symmetric Com¬ 
ponents. (By tilb) Maiujv Castw.laki, University of Kansas City. 

The purpose of thia pafier is to precent some results obtained when operation* of ccinvu- 
lulion in lb are concerned with a specific family of distributions. Th« compound dislrilm- 
tion K(x) — FM • GM is here obtained combining any d.f. F(x) with a cl.f. O(x) under 
the restriction of symmetry, i.e., 0(x + A) + Qix - A) m I for any A > 0. 

A generalisation of Canlelli'a Inequalities will enable ua to write a preliminary theorem 
on the following upper and lower bounda: 

A'(a - A) ~ 2^ dGiy) < K(a) < F<a + A) + 2 J* dGly), 

Kia A) - 2 dOCy) < F(a) < K(a + A) f 2 jT tlWyi, 


where a is any point in Hi and A > 0. 

The theorem is derived assuming the Btieiljes Integral, 



FCa — y) d(r(y), 


is taken as a sum of three integrals connected with three convenient intervals (— • , —A), 
(—A, A), (A, +«). When the symmetric component of the convolution is a member of a fam- 



ABSTRACTS OP PAPERS 


m 


2 r* 

il y „f normal dialnhuhnns «' irh » " v / r J «~“ V dp, where 

(unrtrT, the urn- of fwiirlliV In-Tialmi- **ve 

•i , 

K„<« - h) - “ J / * “ •*« < H*> ~ Kja, 


«is an arbitrary p a r- 


■ A« f n f A) - A'„(a) f 


■hi 


du t 


where K.(*i - /'l JJ * 

The d.f. K»(ii is a continuous 


— r ,U f ?' ,r,U, "' n R ' * WllU R fr - M*) which is everywhere 
uniformly conUmmua. For an arbitrarily email o > 0, a convenient small h and Lee « 
may be found which will enable us to prove the following two theorems: 

Theorem 1 (liven any d.f Fin n% H, , there exist* a convenient continuous d t. K (x) 
which fora -» * ronverge* tu«ypm(*iiirnlly and uniformly almost everywhere to the given 
cl f 

Theorem 2: (liven any d.f. F<r ) in It, , there * suit* in any continuity bordored interval 
a convenient uniformly converge!!' series of roninvmiut (unctions which asymptotically 

approach the given P{*j. 


8. Partial Suras of the Negative Binomial in terms of the Incomplete Beta- 
’ Function. (By Ullr) Jtrutm Lmtt.Kts, HUitialiraJ Engineering Laboratory, 
National Bureau of 8ta»fiwd«. 


In acceptance sampling a certain rise sample is taken at random from a loL or items mid 
the lot is accepted if the number of defective Mem* do ant exceed a predetermined number 
ch&rafU-n*tie of the KwnfdinK plan. Tire Siaiintirid Engineering Laboratory has been 
studying the prnbablHMe* that a dwtracm to accept or repvt can | w made before the sample 
is completely inapeoUsd. Huch probabilities are found y, involve certain sums apparently 
not previously treated. In this note the author prove* a simple identity connecting these 
suras which greatly facilitate* their computation and shows how they may be written in 
terms of the well known incomplete Uta function <4 Karl IVwon, for which extensive 
tables are available. 

9. Urge Sample Testa and Confidence Tntemb for Mortality Rates, (By title) 
* John E. Wawb, RAND Corporation, Santa Monica, California, 


In computing mortality rate# from mstiramr data, the unit of measurement used is fre¬ 
quently based on nurnlier »f policies or amount of iruutau™ rather than on lives. Then 
the death of one person mnv result in several unit* of "bv.V* with re»|wl to tlm investi¬ 
gation j moreover, the number of units per individual may vary noticeably. Thus the usual 
large sample method*, of obtaining significance and omMuiw intervale for the true 
value of the mortality rat* arc not applicable i« ibw If the number of units 

associated with each jwtwnn in the iwstigatnm were Vm.wn, oreumtn large sample results 
could be obtained; however, determination of the number „f unit# Mawtated with each 
individual would require an extremely large amount -4 *,«rt This article presents some 
valid large sample leal* and eonftdvnce interval* for the mortality rate which do not re¬ 
quire much work and are reasonably cflieient. The pr«c**|tinr followed consists in first di¬ 
viding the rieka into twenty-six auhgrmipa on the basis of the first 1< tier of the Ism l name 
of the person insured. Borne of the group* are then rcmboW until 10 to 15 subgroups 
yielding approximately the mi# 1 * numl«-r of unit# atr obtained The fraeliwn fonsiating of 
the total number of unit* paid divided by the Iota) number of unit* exposed k computed 


nr&ft \stt 


far each tr** r»*>iSi)»s *tUwt*nimni* t* pr****** iH'kjw*:j(!nni, ob- 

tervAtiom (ram rnnnntimu *3 pn/iuinf !*■;.# w#*5» * *»iirt< , u mulwn rtjusU to ib« 

t«M Vd.itlo of fh« rti mortal U% T*(rt* wtl rofcixk fir*. inVfUjUn (fit ffi« tM* of rtiorifi)- 
tty art* oWlwnr4 t»y *j*p5yi#a Oi* rejtoIiN '•! riw rap**' ’'***•#■.*» TV*}# for lbs 

Median ehirli are Valid l'»*i*rr Wry ikn*»i*l F/«n«bit*»*’' i.lwnal* id V-tsh Hial , Vo), 20 
mm, pp ft} n'lo tlm 


NEWS AND NOTICES 

tfaadvrt or* mmft 4 Is iidtmit In the B* ft elaty 0/ !*>» tn*n Jvk #*w* ifwi* 0/ s«l»*r»i 

Personal Items 

Mr, Fml t\ Andrews will he a teaching aMwstant in the HIat mitral laboratory, 
Department of Mathematics, University of Cntifumia for the anulnmic year 
1M9-105O. 

Dr. Joseph Berkuon has been promoted fr< the rank "f Pruftniwr in the Uni¬ 
versity of Minnesota (tnulunte School ami Mayo Fiwinrifstutn. Hi* continues as 
Chief of the Division of Biometry am! Medical Statmljr** of the Mayo Clinic. 

Mr. Colin R. Blylh is now a research «m<*lant at the University of California, 
Statistical Laboratory, Berkeley, 

Mr, Clyde A. Bridger is now Director of tin* Section of Htntmtim and State 
Registrar of Vital Statistics for the Division of Health of Mi*.»ouri. 

Mr, Loren V, Bums, formerly with the MPA Milling Company At Springfield, 
Missouri, lues been made Vice-President and Technical Director of the Spear 
Mills, Inc., Kansas City fl, Missouri, 

Professor Douglas Chapman, who obtained hi# Ph.D. in statistic** at the Uni¬ 
versity of California, Berkeley, has accepted art appointment as Assistant Pro¬ 
fessor at the University of Washington in the Department of Mathematics and 
the Laboratory of Statistical Research. 

Dr. Andrew Laurence Comrey, who received his doctor’s degree from the Uni¬ 
versity of Southern California last June, has accepted an assistant professorship 
in the Department of Psychology at the University of Illinois. 

Dr. Donald A. Darling has been appointed to an instructorship in the Depart¬ 
ment of Mathematics, University of Michigan. 

Dr. Paul M, Denson resigned his position as Chief of the Division of Medical 
Research Statistics of Bio Department of Medicine and Surgery of the Veterans 
Association as of July 1,1949 to join the staff of the Graduate School of Public 
Health, University of Pittsburgh, as an Associate Professor of Biostatistica. 

Mr. Amron H, Kata has been promoted to the position of Chief Physicist of 
the Photographic Laboratory, Engineering Division, Air Material Command, 
Wright Patterson Air Force Base, Dayton, Ohio, 

Associate Professor Louis Guttmann, who had been on leave for two years from 
the Department of Sociology of Cornell University conducting a research pro¬ 
gram in Israel, waa invited to remain in Israel for another year to direct the 



NEWS AND NOTICES 


625 


activities of the recently founded Israel Institute of Public Opinion Research. 
He is serving as Chief Consultant. 

Mr. Home Ernest L^Foutsat who was attending the University of Michigan, 
during the academic year 1948-1949 working on his doctor’s degree, has accepted 
a position as statistician for the B.T.W. Insurance Co. at Birmingham, Alabama. 

Assistant I rofcssor^^jronie C. It. Li has been promoted to Associate Professor 
of Mathematics at the Oregon State College, Corvallis, Oregon. 

Professor II. 13. Mann of Ohio State University has accepted a visiting, 
professorship and research associateship at the Statistical Laboratory at 
Berkeley, California for the year 1949-1950. 

Dr Gottfried E. Noether has been appointed to an instruetorship at New York. 
University. 

Dr. G. It. Seth has just returned from a trip to England, Sweden, France and 
India where he visited statistical institutions. 

Assistant Professor Andrew Sobczyk has been promoted to Associate Professor 
of Mathematics at Boston University, 

Dr. Zenon Szatrowski, formerly with the Economics Department of North¬ 
western University, has accepted an associate professorship in the School of 
Business Administration, University of Buffalo. 

Professor Gerhard Tintner has returned to his teaching and research duties at 
Iowa State College after spending a year at the Department of Applied Eco¬ 
nomics at the University of Cambridge, England. He gave a course on Econ¬ 
ometrics at the University of Cambridge and during his stay in Europe, he 
lectured on econometric and statistical subjects in Universities at Bristol, Dublin, 
Hull, Paris, Manchester and Uppsala. 

Dr. A. E. R, WeBtman, Director of the Department of Chemistry, Ontario 
Research Foundation, left in September, 1949 for England where he is visiting 
industrial research laboratories and engaging in studies in the Department of 
Physical Chemistry, Cambridge University. He plans to return in June, 1950. 


Word has just been received here of the formation of the New Zealand Statisti¬ 
cal Association. The initial meeting was held m August, 1948 at Victoria Uni¬ 
versity College. The officers are: J. T. Campbell, President; I. D. Diok, Secretary. 
It is planned to hold one formal meeting a year at first with the hope of increasing 
this later. The main interest in statistical work in New Zealand has been bio¬ 
logical, but there is scope for considerable extension to industrial, educational and 
economic fields and it is hoped the fonnation of the Association will assist in this 
extension. 


New Members 

The following persons have been elected to Membership m the Institute 
(June 1, 1940 to August ti, 1949 ) 

Al-Doori, Younis A., Student at the University of California, 1815 Henry Street, BcrUlcJ, 
California. 



NKWa axr w>Tirr.« 


Slsb4r, Robert A., A f» (Uinv of falif ) /f.ff T*rwe. Rirh^nA, California. 

Bttla, CloUlde Angelica, I'h D, fl'nv of IW,nT*w, Ar&'iitma ■ fYofrwor, University of 
BttPfwa Aims, Rioja. S9S!, trim* t.-y A* .Ilf'*, Argentina 

Dal riel, Edwin R,, Ph I> il'nlv . I Mtuhurgh' AwretsM Mrwiw, IMiupmloa North Techni- 
ml Rfln.nl, PalmsHrton Norik, Nw Zealand 

Douglas, JamesIi,, Dip, Erf. <Mrtfr**tirn I'mv < let'Utrrr m Meitemutmi*. NeweanUeTech* 
aieal(WleRe.Tigte'aHi!!2N, N H W , At*»<r«l>a. 

Hartley, Herman 0., Ph l). fGaflibrtdft* t 'mw . l/Priurar sn Rtalwlies, Deparitmeni o( 
Statistics, llnivervity CVIIege, temdoa, W f* i, England 

Immel, Krlc R.,M.A. (Qurwfs r«iv„, KtagNlnti,f‘anada Tearhing Awslanl «nd Gradual® 
Student, Department of Mattentat tra, Dweereit* «f OM«fn».i a! Ie*» Angeles, Los 
Angalw. California. 

Kelly, John P., Senior Technical Engineer, f*arl«id«» nod Garten Chemical Corporation, 
Oak Ridge, Tenoraww*, /’ O Rm (?A, Sunn*, 

Pare!, Cristina P., M.H, tt’niv of Michigan; Iwinn-mr. Ifc>j*irtiwtil nl Mathematics, 
University of the Philippines, Manila. I* 1 

Phlllpson, Carl 0., II,He. (tlniv of Stockholm I Artuary «f frdte*. R.tmiirteU*. Vnpmoftn 
H, tfjumhoUn, 

Porter, Robert A., Pk.D. (N,f\ Stale College, Raleigh, N V ) Hi m«r Mathernatirian, Uni¬ 
versity of Chicago, 171 !5 Long/rUme Awmut, l/««n.«l, l.’iiaii 

Rlppe, Dayle D., M.A. (tlniv, of Neltr, J Student,Tracking IVII«<«, Department of Matte* 
malice, University of Michigan, /flftfWVJWn (“««,»/. Wiffw Run, Wirbipan. 

Rogers, Robert L,, A.IJ. (Univ, of Calif ; Student at CnivemHy «*f California, flfmifc i, 
Hm74, Dmio Avenue, Gilroy,Calif arum. 

Roy, Samarendra N., M.Hf. (Calcutta tinted Head of Drp«rtment of HtaUnties, Calcutta 
University and AeakUnl Director, Indian Slats," tin I Iruntime (now on leave) P.0. 
Hat MS, Chapel Hill, North Carolina. 

Savoy, Rosemary, M,B.A„ (Univ. of Witte.) Graduate Awtelanl and Student, University of 
Wisconsin, g$lS Naromd Place, Matli&mS, Wi/rmmn. 


REPORT ON THE BOULDER MEETING OF THE INSTITUTE 

The Twelfth Summer Meeting of the Institute of Mathematical Statatics 
was held at the University of Colorado, Boulder, Colorado, Monday, August 29 
through Thursday, September 1, 1949, The meeting was held in conjunction 
with the summer meetings of the American Mathematical Society, the Mathe¬ 
matical Association of America, and the Econometric Society. The meeting was 
attended by the following 79 members of the Institute; 

S..P. Agarwnl, It, I,. Anderson, T. W. Anderson, V. I,. Anderson, K. J. Arnold, E. W. 
Barankln, G. A. Bennett, Agnes Berger, B. li, Btanohi), A. If. Hawker, J. C. Brlxey, Jean 
Bronfonbronnor, J. II. Bushey, II, C. Carver, Herman Olwtrnoff, K. L. Chung, A. 0, Clark, 
K. P. Coleman, 10. I„ Crow, J, H. Curtiss, W, J. Dixon, J. L. Doob, Aryeh Dvoretsky, 
H. P. 10vans, W. D. Evans, W, T. Pederer, William Feller, C. H. Fischer, J. B. Frame, T. 0. 
Pry, II. M. Gohman, II. H. Oerroond, R. E, Greenwood, II, T. Guard, P. It. Halmofl, 3. L, 
Hodges, P, 0. Hool, Harold Ilotelling, J, M. Howell, C. G. Hurd, C. A. Hutchinson, Irving 
Kaplansky, Leo KaU, H. 8, Konijn, T. C. Koopmans, Q. M. Kusmets, II. D. Larsen, D. H. 
Leavens, S. B. LRtauer, H. B. Mann, Jacob Marsehak, F. J, Moseys, Dorothy J. Morrow, 
Jeny Neyman, M, L. Nordon, J. I. Northam, B. G. Olds, R. P. Peterson, G. B. Price, 
Mina S, Rees, P. R. Rider, F, D. Rigby, Herman Rubin, L. J. Savage, Elisabeth R, Scott, 



REPORT on the boulder meeting 


627 


I. E. Begal, Esther Maiden, Jack Sherman, W. B. Simpson, Milton Sobel, D. M Studlev 
B. R. Kuydam, A. (1 Swanson, James Templeton, 11. M. Thrall, J. W. Tukev 
Wahl, John Wishart, S. H. Wilks. y ’ maham 

The Monday afternoon session was devoted to invited addresses with Pro¬ 
fessor Leonard J. Savage of the University of Chicago presiding. The attendance 
was approximately fifty. Professor J. L. Hodges of the University of California 
presented a paper, Some Problems in Point Estimation, and Professor W T. 
Federcr of Cornell University presented A Comparison of the Proportionality of 
Covariance Matric.es. 

On Tuesday Morning the Institute, the Mathematical Association of America, 
and the Econometric Society held a joint symposium on Mathematical Training 
for Social Scientists, Professor Jacob Marschak of the Cowles Commission for 
Research in Economics presided. The attendance was approximately one hundred 
fifty. The participating speakers were: Professor R. L. Anderson of North 
Carolina State College; Professor T. W. Anderson of Columbia University; 
Professor G. C. Evans of the University of California, Professor F. L. Griffin 
of Reed College; Professor Harold Gulliksen of Educational Testing Service, 
Professor William Jaffd of Northwestern University, Professor Harold Hotelling 
of the University of North Carolina; and Professor G, M. Kuznets of the Uni¬ 
versity of California. At the end of tiie session the following resolution was 
adopted by those in attendance at the meeting: 

Members of the Mathematical Association of Amerioa, the Institute of Mathematical 
Statistics, and the Econometric Society assembled in a jointsession in Boulder, Colorado, 
on August 80, Ifl-UI, are of flic opinion that officers of those Booietios should study the 
need for better mathematical training of social scientists, and the ways and mehris to 
improve mathematical preparation of social scientists, and that such a study may be 
moat effectively conducted by a joint committee, possibly in co-operation with other 
interested societies, and in close touch with the Social Science Research Council, the 
National Research Council, or other national bodies concerned with general education 
and research. It is suggested that this committee report the results of its deliberations 
at the next joint mooting of tho original participating societies 

The two joint sctaduriK of the Institute and the Econometric Society were 
devoted to a Symposium on Statistical Inference in Decision Making, Professor 
Jerzy Neyman of the University of California presided on Tuesday afternoon. 
The attendance was approximately eighty. Professor Aryeh Dvoretzky of 
Hebrew University, Jerusalem presented Decision Problems and Professor Abra¬ 
ham Wald of Columbia University presented Some Recent Results in the Theory 
of Statistical Decision Functions, On Wednesday Morning, under the chairman¬ 
ship of Professor Wald and an attendance of approximately seventy-five, the 
following papers were presented: Remarks on a Rational Selection of a Decision 
Function by Professor Herman, Chcrnoff of the Cowles Commission for Researc 
in Economics; Psychological Probabilities by Professor Leonard J. Savage; an 
Complete Classes of Decision Functions for Some Standard Sequential and on 
sequential Problems by Milton Sobel of Columbia University. 



iftgfon Hniwoty prwhng IV m\*rA<iU<p w,%s nppmimatdy wv«inly*Bve, 

The Wkwiog pap *fn wart* prrotiK*) • 

S Stnufta* 4 fUMUtnd fflmntt 
Mr. Dwm M, iHstikf. ivjorvV, %nkp, 

2 , tfa H# MkkHx# $0nmmm mf &|V 
ItafeMtof 1*» KaU. Mldw*»*» Ma*«» Odtap* 
t A4i f *i$m$fM 4 #n l*tm» V#w Cj»«*»|»miAv J# »«A# fiimwtti <•./ # Ofm 

f.'wbwi @r « fymt ffov 41 \* iftny im! *7afm 

Hr, Jidt ftbrnaui m 4 SJw Wjstffwu? I AVuriwa Tfc* Tnw Company lUwareti 
iabwMotk«» IfewM, N«* Yw»k, ' 

4, On A? PrMm #/ Optemm CfawpQttw 
PmfoMwr Paul (!. Urn 1,f/ftiwiwlty <d Cihfnmn at l*m Auftfa* 

I, Ophmd iiwf httiutim <tf fr'«»w trV>*» fVmtmoft* #*v» tfrrm't P#«. 

turn 

Profwtwr f* 1, IXdjito a*»4 Ur M A Cwvetkiv «■( 

0, Tie btfefml e/ tie (htt/MPi* ftettrikuterm <w ?V 4«* tlm/mtei ty on Blh/m, 

Dr- H. II. Onwwl, K*m«J * krpw.w, Huii* Maws, CaJilWnu 

7 , Thnrm* tm M«Nb|i«i mfti Aynmttrk Cm/mmlt. 

(By KlleS 

Dr. Mark I'mwwiy »f Kamo* Olv 

8. L*tr$t, tiamplt TmU ml C<wW«tf# intermit fat s Hahn, {By litJel 

Dr. i. B, ViM, Itad (JarjiaiaBon, f»a«i* M»air.», VnUUtvm 

0, Partial Hm» a/ tie Stjtam Hitumvd m T«rm »/ lie tn*mpldt Bark jmtim. (Bv 

title) 

Dr. Julius Jiebkh, Nations! Haraan of 

Ob Thursday aftemam Profswtr imy NVyman presented lhe Bemd Kina 
Memorial Lecture on. Vamktml RuUmt/n of lh? I Am w Htmrtvrd HeloUwt in 
(he Qcmrd 0o*e a/ ftlmtijiabilUy. Ihrofrwwr Hamid Holdhng prrwiiW and the 
attendance was approximately liflv. Dr. R, P. Bom, Jr, of Mathematical Review 
pMBented an invited address fb- foprtmtotim of PtdmkUlt) ItiMbuUm by 
Ckarluir 

The Institute sponsored a, bwr party on Tuwdey trVi*nhiR «id on Thursday 
evening a fry was held on Flagstaff Mountain, 


Hikhw T. Ghabb 






